Pre-trained language models encode undesirable social biases, which are further exacerbated in downstream use. To this end, we propose MABEL (a Method for Attenuating Gender Bias using Entailment Labels), an intermediate pre-training approach for mitigating gender bias in contextualized representations. Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs from natural language inference (NLI) datasets. We also introduce an alignment regularizer that pulls identical entailment pairs along opposite gender directions closer. We extensively evaluate our approach on intrinsic and extrinsic metrics, and show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness. It also preserves task performance after fine-tuning on downstream tasks. Together, these findings demonstrate the suitability of NLI data as an effective means of bias mitigation, as opposed to only using unlabeled sentences in the literature. Finally, we identify that existing approaches often use evaluation settings that are insufficient or inconsistent. We make an effort to reproduce and compare previous methods, and call for unifying the evaluation settings across gender debiasing methods for better future comparison.
翻译:培训前语言模型,将不良的社会偏见纳入到下游使用中,这种偏见在下游使用时会进一步加剧。为此,我们建议MABEL(使用零售标签减少性别偏见的方法),这是在背景化的表述中减少性别偏见的中间培训前办法。我们的方法的关键是,在自然语言推论数据集中,对反实际增加的、性别平衡的成因对采用对比学习目标。我们还引入了一种调整常规,将相同的需要配对引向相反的性别方向。我们广泛评价了我们关于内在和外部指标的方法,并表明MABEL在公平性方面超越了以前的任务性不偏向性方法。在对下游任务进行微调之后,还保留了任务性表现。这些结果共同表明,国家语言数据库数据作为减少偏见的有效手段是合适的,而不是仅仅在文献中使用无标记的句。最后,我们发现,现有方法经常使用不充分或不一致的评价环境。我们努力复制和比较以往方法,要求统一今后在性别上进行对比的方法。