Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender or other protected personal characteristics, thus discriminating against marginalized groups. Mitigating gender bias has become an important research focus in natural language processing (NLP) and is an area where annotated corpora are available. Data augmentation reduces gender bias by adding counterfactual examples to the training dataset. In this work, we show that some of the examples in the augmented dataset can be not important or even harmful for fairness. We hence propose a general method for pruning both the factual and counterfactual examples to maximize the model's fairness as measured by the demographic parity, equality of opportunity, and equality of odds. The fairness achieved by our method surpasses that of data augmentation on three text classification datasets, using no more than half of the examples in the augmented dataset. Our experiments are conducted using models of varying sizes and pre-training settings.
翻译:在商业应用中占主导地位的数据驱动预测解决方案往往受到偏见和陈规定型观念的影响,这引起了公平问题。预测模型可能发现、使用或扩大基于性别或其他受保护的个人特征的虚假关联,从而歧视边缘化群体。减少性别偏见已成为自然语言处理(NLP)的一个重要研究焦点,并且是一个可以附加注释的领域。数据扩增通过在培训数据集中增加反事实实例来减少性别偏见。在这项工作中,我们表明,扩大数据集中的一些实例对公平可能并不重要,甚至可能有害。因此,我们提出了一个一般方法,用人口均等、机会平等和机会均等衡量的方式,尽量扩大该模型的公平性。我们的方法所实现的公平性超过了在三个文本分类数据集中的数据增强程度,使用不超过扩大数据集中实例的一半。我们用不同规模和培训前环境的模式进行实验。