Pretrained text encoders, such as BERT, have been applied increasingly in various natural language processing (NLP) tasks, and have recently demonstrated significant performance gains. However, recent studies have demonstrated the existence of social bias in these pretrained NLP models. Although prior works have made progress on word-level debiasing, improved sentence-level fairness of pretrained encoders still lacks exploration. In this paper, we proposed the first neural debiasing method for a pretrained sentence encoder, which transforms the pretrained encoder outputs into debiased representations via a fair filter (FairFil) network. To learn the FairFil, we introduce a contrastive learning framework that not only minimizes the correlation between filtered embeddings and bias words but also preserves rich semantic information of the original sentences. On real-world datasets, our FairFil effectively reduces the bias degree of pretrained text encoders, while continuously showing desirable performance on downstream tasks. Moreover, our post-hoc method does not require any retraining of the text encoders, further enlarging FairFil's application space.
翻译:在各种自然语言处理(NLP)任务中,如BERT等经过事先训练的文本编码器越来越多地应用到各种自然语言处理(NLP)任务中,并且最近表现出了显著的绩效成果。然而,最近的研究表明,这些经过训练的NLP模型中存在着社会偏见。虽然先前的工作在字级贬低方面取得了进展,但经过训练的编码器在判刑方面的更公平程度仍然缺乏探索。在本文件中,我们提出了为事先训练的文本编码器采用的第一个神经去偏见的方法,该方法通过一个公平的过滤器(FairFil)网络将经过训练的编码器输出转化为失偏颇的表达方式。为了学习FairFil,我们引入了一个对比式学习框架,不仅尽量减少过滤的嵌入和偏向词之间的相互关系,而且还保留了原有句中的丰富的语义信息。在现实世界数据集中,我们的FairFil有效地降低了经过训练的文本编码器的偏向偏向性程度,同时不断显示下游任务的适当性。此外,我们的后热法方法并不要求重新培训的文字编码器,进一步扩大FairFirFil的应用空间应用。