Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of DNN models by only debiasing the classification head, even with biased representations as inputs? To this end, we propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. The key idea of RNF is to discourage the classification head from capturing spurious correlation between fairness sensitive information in encoder representations with specific class labels. To address low-resource settings with no access to sensitive attribute annotations, we leverage a bias-amplified model to generate proxy annotations for sensitive attributes. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.
翻译:为解决这些问题,我们探讨以下研究问题:我们能否减少DNN模式的歧视,只通过贬低分类头,即使有偏颇的表述作为投入;为此,我们提议一种新的缓解技术,即“公平代表中立”(RNF),通过只贬低 DNN模式的任务分类头目来实现公平。为此,我们利用一个带有相同地铁标签但有不同敏感属性的样本,并利用这些样本的中性表示来训练DNN模式的分类头目。RNF的主要想法是劝阻分类头不要捕捉带有特定类标签的编码代表的公平敏感信息之间的令人憎恶的关联。为了解决没有敏感属性说明的低资源环境,我们利用一个有偏差的模型来生成敏感属性属性的代理说明。为此,我们利用一个带有相同地铁标签标签但有不同敏感属性的样本,并用它们的中性表示方式来训练DNNF模式的分类头目。一些实验结果显示,在NF的最小性能框架下,我们用最低性能基准性能测试。