Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., the gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information 'fairly' rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019). In this work, we provide a novel debiasing algorithm by adjusting the predictive model's belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.
翻译:减少国家劳工政策模式中偏见的近期工作通常侧重于保护或孤立与敏感属性(如性别或种族)有关的信息;然而,当敏感信息与投入的任务信息存在内在联系时,例如,性别信息对某一职业而言具有预测性,很难做到任务业绩与减少偏见之间的公平权衡;现有做法通过消除潜在空间的偏见信息来实现这一权衡,缺乏对必须消除多少偏见的控制权;我们认为,有利的贬低方法应当使用敏感信息“公平”而不是盲目消除该信息(Caliskan等人,2017年;Sun等人,2019年);在这项工作中,我们提供了一种新的贬低性算法,通过调整预测模式的信念,(1) 如果敏感信息对任务没有用处,则忽略敏感信息;(2) 尽可能使用敏感信息来进行预测(同时受到处罚);两种文本分类任务(由性别影响)的实验性结果,以及一种开放式的一代任务(受种族影响)表明,我们的模型在产生可取的实绩依据时,将产生一种理想的贸易价值。