Pretrained language models (PLMs), such as GPT2, have achieved remarkable empirical performance in text generation tasks. However, pretrained on large-scale natural language corpora, the generated text from PLMs may exhibit social bias against disadvantaged demographic groups. To improve the fairness of PLMs in text generation, we propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity, i.e., the demographic group to which the sentence is referring. In this way, the mentioning of a demographic group (e.g., male or female) is encouraged to be independent from how it is described in the generated text, thus effectively alleviating the social bias. Moreover, we propose to efficiently estimate the upper bound of the above mutual information via importance sampling, leveraging a natural language corpus. We also propose a distillation mechanism that preserves the language modeling ability of the PLMs after debiasing. Empirical results on real-world benchmarks demonstrate that the proposed method yields superior performance in term of both fairness and language modeling ability.
翻译:GPT2等受过训练的语言模型(PLM)在文本生成任务方面取得了显著的经验性表现,然而,在对大型自然语言公司进行预先培训之前,PLM产生的文本可能会对处境不利的人口群体产生社会偏见。为了提高PLM在文本生成中的公平性,我们提议尽量减少生成文本句的语义与其人口两极分化(即该句所指的人口群体)之间的相互信息。通过这种方式,鼓励提及一个人口群体(如男性或女性)独立于生成文本的描述方式,从而有效地减轻社会偏见。此外,我们提议通过重要抽样,利用自然语言材料,有效估计上述相互信息的上层界限。我们还提议一个提炼机制,在贬低偏见之后保持PLMs的语言建模能力。关于现实世界基准的“经验”结果表明,拟议的方法在公平性和语言建模能力方面都取得了优异的成绩。</s>