Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask: does training on demographically perturbed data lead to more fair language models? We collect a large dataset of human annotated text perturbations and train an automatic perturber on it, which we show to outperform heuristic alternatives. We find: (i) Language models (LMs) pre-trained on demographically perturbed corpora are more fair, at least, according to our current best metrics for measuring model fairness, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks. We find that improved fairness does not come at the expense of accuracy. Although our findings appear promising, there are still some limitations, as well as outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this initial exploration of neural demographic perturbation will help drive more improvement towards fairer NLP.
翻译:在NLP的研究中,不受欢迎而且往往有害的社会偏见越来越明显,影响到模型和数据集。在这项工作中,我们问:关于人口结构受扰的数据的培训是否导致更公平的语言模型?我们收集了大量的人类附加说明的文本扰动数据,并训练了自动扰动数据,我们展示出优于外表外表的另类方法。我们发现:(一) 语言模型(LMS)预先训练的关于人口结构受扰动的子公司更加公平,至少根据我们目前衡量模型公平性的最佳衡量标准,以及(二) 对被扰动的GLUE数据集进行微调的LMS在下游任务上显示出较少的人口偏见。我们发现,改善公平性并不以牺牲准确性为代价。尽管我们的调查结果似乎很有希望,但仍然存在一些限制,以及关于如何最好地评价大型语言模型(非公平性)的未决问题。我们希望,对神经人口结构的这种初步探索将有助于推动更公平的NLP的改进。