Written language carries explicit and implicit biases that can distract from meaningful signals. For example, letters of reference may describe male and female candidates differently, or their writing style may indirectly reveal demographic characteristics. At best, such biases distract from the meaningful content of the text; at worst they can lead to unfair outcomes. We investigate the challenge of re-generating input sentences to 'neutralize' sensitive attributes while maintaining the semantic meaning of the original text (e.g. is the candidate qualified?). We propose a gradient-based rewriting framework, Detect and Perturb to Neutralize (DEPEN), that first detects sensitive components and masks them for regeneration, then perturbs the generation model at decoding time under a neutralizing constraint that pushes the (predicted) distribution of sensitive attributes towards a uniform distribution. Our experiments in two different scenarios show that DEPEN can regenerate fluent alternatives that are neutral in the sensitive attribute while maintaining the semantics of other attributes.
翻译:书面文字含有明确和隐含的偏差,可以转移有意义的信号。例如,参考书可能以不同的方式描述男性和女性候选人,或者他们的写作风格可能间接揭示人口特征。充其量,这种偏差会分散对文本中有意义的内容的注意力;最坏的是,它们可能导致不公平的结果。我们调查了在保持原始文本的语义含义的同时,重新生成输入句“失效”敏感属性的挑战(例如,候选人有资格吗?)我们提议了一个基于梯度的重写框架,即检测和抄写以中立化(DEPEN),首先检测敏感成分并掩盖它们再生,然后在将敏感属性的(预先)分布推向统一分布的中性限制下,在解码时间干扰一代模式。我们在两种不同情况下的实验表明,DEPEN可以重新生成敏感属性中中性流的替代物,同时保持其他属性的语义。