Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how techniques contribute to biases and on progress towards bias analysis and mitigation. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.
翻译:由于对大量数据进行预先培训的大型模型取得了进展,而且智能剂需要自然地进行交流,语言生成技术迅速发展,虽然技术能够有效产生流畅的文字,但也可能产生不良的社会偏见,对边缘化人口产生不成比例的消极影响;语言生成在直接用户互动和解码技术结构方面提出了独特的挑战;为了更好地理解这些挑战,我们提出一份关于语言生成方面的社会偏见的调查,重点是技术如何促进偏见,以及偏向分析和减轻偏向的进展;由于对解码技术的偏向缺乏研究,我们还进行实验,以量化这些技术的效果;通过进一步讨论一般趋势和公开挑战,我们呼吁注意研究的有希望的方向以及公平性和包容性因素对于语言生成应用的重要性。