Large Language Models (LLMs) have recently demonstrated impressive capability in generating fluent text. LLMs have also shown an alarming tendency to reproduce social biases, for example stereotypical associations between gender and occupation or race and criminal behavior. Like race and gender, morality is an important social variable; our moral biases affect how we receive other people and their arguments. I anticipate that the apparent moral capabilities of LLMs will play an important role in their effects on the human social environment. This work investigates whether LLMs reproduce the moral biases associated with political groups, a capability I refer to as moral mimicry. I explore this hypothesis in GPT-3, a 175B-parameter language model based on the Transformer architecture, using tools from Moral Foundations Theory to measure the moral content in text generated by the model following prompting with liberal and conservative political identities. The results demonstrate that large language models are indeed moral mimics; when prompted with a political identity, GPT-3 generates text reflecting the corresponding moral biases. Moral mimicry could contribute to fostering understanding between social groups via moral reframing. Worryingly, it could also reinforce polarized views, exacerbating existing social challenges. I hope that this work encourages further investigation of the moral mimicry capability, including how to leverage it for social good and minimize its risks.
翻译:大型语言模型(LLMS)最近表现出了令人印象深刻的流畅文字生成能力。LLMS也表现出惊人的复制社会偏见的趋势,例如性别与职业或种族和犯罪行为之间的陈规定型联系。像种族和性别一样,道德是一个重要的社会变量;我们的道德偏见影响着我们如何接收其他人及其论点。我预计LMS的明显道德能力将在对人类社会环境的影响中发挥重要作用。这项工作调查LMS是否复制了与政治团体有关的道德偏见,我称之为道德仿真能力。我在GPT-3中探索了一个以变异器结构为基础的175B参数语言模型,它利用道德基础理论的工具来衡量在以自由和保守的政治身份激发的模型生成的文字中的道德内容。结果显示,大型语言模型确实具有道德模仿作用;在政治身份的推动下,GPT-3生成反映相应道德偏见的文字。道德模拟可以帮助通过道德再造能促进社会群体之间的理解。我反复地探讨这一假设,它也可以加强极化观点,包括现有社会风险的最小化。我希望它是如何降低道德风险。