Dialogue systems in the form of chatbots and personal assistants are being increasingly integrated into people's lives. These dialogue systems often have the ability to adopt an anthropomorphic persona, mimicking a societal demographic to appear more approachable and trustworthy to users. However, the adoption of a persona can result in the adoption of biases. We define persona biases as harmful differences in text (e.g., varying levels of offensiveness or affirmations of biased statements) generated from adopting different demographic personas. In this paper, we present the first large-scale study on persona biases in dialogue systems and conduct analyses on personas of different social classes, sexual orientations, races, and genders. Furthermore, we introduce an open-source framework, UnitPersonaBias, a tool to explore and aggregate subtle persona biases in dialogue systems. In our studies of the Blender and DialoGPT dialogue systems, we show that the choice of personas can affect the degree of harms in generated responses. Additionally, adopting personas of more diverse, historically marginalized demographics appears to decrease harmful responses the most.
翻译:这些对话系统往往能够采用人类形态的人,模仿社会人口结构,以更贴近、更值得使用者信赖的方式对待使用者;然而,采用个人特征可导致采取偏见;我们把个人偏见定义为因采用不同的人口特征而形成的文字上的有害差异(例如,不同程度的冒犯性或对有偏见言论的确认);我们在本文件中介绍了关于对话系统中个人偏见的第一次大规模研究,以及对不同社会阶层、性取向、种族和性别的人进行的分析;此外,我们引入了一个开放源码框架,即Unitera Biaas,这是在对话系统中探索和汇总微妙的人的偏见的工具;我们在对Blender和DialoGPT对话系统的研究中显示,人的选择可以影响产生反应时的伤害程度;此外,采用更多样化、历史上处于边缘地位的人口特征的人似乎减少了最有害的反应。