Dialogue systems in the form of chatbots and personal assistants are being increasingly integrated into people's lives. Modern dialogue systems may consider adopting anthropomorphic personas, mimicking societal demographic groups to appear more approachable and trustworthy to users. However, the adoption of a persona can result in the adoption of biases. In this paper, we present the first large-scale study on persona biases in dialogue systems and conduct analyses on personas of different social classes, sexual orientations, races, and genders. We define persona biases as harmful differences in responses (e.g., varying levels of offensiveness, agreement with harmful statements) generated from adopting different demographic personas. Furthermore, we introduce an open-source framework, UnitPersonaBias, to explore and aggregate persona biases in dialogue systems. By analyzing the Blender and DialoGPT dialogue systems, we observe that adopting personas can actually decrease harmful responses, compared to not using any personas. Additionally, we find that persona choices can affect the degree of harms in generated responses and thus should be systematically evaluated before deployment. We also analyze how personas can result in different amounts of harm towards specific demographics.
翻译:现代对话系统可以考虑采用人类形态人,模仿社会人口群体,使其看起来更容易接近,更值得使用者信任。然而,采用人可能会导致采取偏见。在本文中,我们介绍关于对话系统中人际偏见的首次大规模研究,并对不同社会阶层、性取向、种族和性别的人进行个人分析。我们把人际偏见定义为因采用不同人口群体而产生的有害反应差异(例如,攻击程度不同,同意有害声明)。此外,我们采用开放源框架UnitedPersonaBias,以探索和综合对话系统中的人际偏见。通过分析Blender和DialoGPT对话系统,我们发现,与不使用任何人相比,采用人际偏见实际上可以减少有害反应。此外,我们发现人际选择可以影响产生的反应中的伤害程度,因此在部署前应系统地评估。我们还分析人际关系如何在特定人口危害方面产生不同的结果。