Adversarial learning can learn fairer and less biased models of language than standard methods. However, current adversarial techniques only partially mitigate model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and the stability of training.
翻译:反向学习可以学习比标准方法更公平、更不偏颇的语言模式,然而,目前的对抗技巧只能部分地减轻模式偏差,再加上其培训程序往往不稳定,在本文中,我们提出一种新颖的对抗学习方法,其基础是使用多种不同的歧视者,鼓励歧视者相互学习正方形隐蔽的表达方式。 实验结果显示,我们的方法比标准的对抗式清除方法大有改进,减少了偏见和培训的稳定性。