Trained classification models can unintentionally lead to biased representations and predictions, which can reinforce societal preconceptions and stereotypes. Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise. In this paper, we propose a method for mitigating bias in classifier training by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations, while instances sharing a protected attribute are forced further apart. In such a way our method learns representations which capture the task label in focused regions, while ensuring the protected attribute has diverse spread, and thus has limited impact on prediction and thereby results in fairer models. Extensive experimental results across four tasks in NLP and computer vision show (a) that our proposed method can achieve fairer representations and realises bias reductions compared with competitive baselines; and (b) that it can do so without sacrificing main task performance; (c) that it sets a new state-of-the-art performance in one task despite reducing the bias. Finally, our method is conceptually simple and agnostic to network architectures, and incurs minimal additional compute cost.
翻译:经过培训的分类模式无意中可能导致偏颇的表述和预测,从而强化社会偏见和陈规定型观念; 现有的贬低分类模式方法,如对抗性培训,往往花费昂贵,难以优化; 在本文件中,我们建议采用一种方法,通过对比性学习,减少分类培训中的偏差,鼓励共用同一类标签的情况具有相似的表述,而共享受保护属性的情况则被迫进一步分离; 通过这种方式,我们的方法学会了反映重点地区任务标签的表述,同时确保受保护属性的分布各不相同,因此对预测的影响有限,从而在更公平的模型中取得成果; 在国家劳工规划的四项任务和计算机愿景中,广泛的实验结果显示:(a) 我们提出的方法可以实现更公平的表述,并实现与竞争基线相比的偏差减少;(b) 在不牺牲主要任务绩效的情况下,它可以这样做;(c) 尽管减少了偏差,但它在一项任务中设定了新的状态。最后,我们的方法在概念上简单,对网络结构也具有微小的配置成本。