Adversarial training is a common approach for bias mitigation in natural language processing. Although most work on debiasing is motivated by equal opportunity, it is not explicitly captured in standard adversarial training. In this paper, we propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features and more explicitly model equal opportunity. Experimental results over two datasets show that our method substantially improves over standard adversarial debiasing methods, in terms of the performance--fairness trade-off.
翻译:反向培训是自然语言处理中减少偏见的常见方法,虽然大多数关于贬低偏见的工作都是以机会均等为动机,但并没有在标准的对抗性培训中得到明确体现,在本文中,我们提议为对抗性培训增加一个歧视者,以目标类别为投入,创造更丰富的特点和更明确的平等机会模式,两个数据集的实验结果显示,我们的方法大大高于标准的对抗性贬低偏见方法,在业绩与公平权衡方面。