语言模式的道德和社会危害风险 (Ethical and social risks of harm from Language Models)

Laura Weidinger,John Mellor,Maribeth Rauh,Conor Griffin,Jonathan Uesato,Po-Sen Huang,Myra Cheng,Mia Glaese,Borja Balle,Atoosa Kasirzadeh,Zac Kenton,Sasha Brown,Will Hawkins,Tom Stepleton,Courtney Biles,Abeba Birhane,Julia Haas,Laura Rimell,Lisa Anne Hendricks,William Isaac,Sean Legassick,Geoffrey Irving,Iason Gabriel

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.

翻译：本文旨在帮助构建与大规模语言模型(LMS)相关的风险景观。为了推动负责任的创新,需要深入理解这些模型构成的潜在风险。对一系列既有和预期的风险进行详细分析,借鉴计算机科学、语言学和社会科学的多学科专门知识和文献。我们概述了六个具体的风险领域:一. 歧视、排斥和毒性,二. 信息危害,三. 错误信息伤害,V. 恶意使用,V. 人-计算机互动伤害,VI. 自动化、无障碍和环境伤害。第一个领域涉及陈规定型观念、不公平歧视、排斥性规范、有毒语言以及社会群体对LMS的较低绩效。第二个领域侧重于私人数据泄漏或错误推断敏感信息的风险。第三个领域涉及包括敏感领域在内的信息贫乏、虚假或误导性信息的风险,以及诸如对共享信息的信任进一步削弱等。第四领域审视了试图使用LMS造成伤害的行为者的风险。第五领域涉及长期责任、不公平歧视、排斥性规范、有毒语言、有毒语言、有毒语言群体参与风险的评估,第六个领域,包括使用不安全性分析工具、使用不同风险的组织、使用不同风险,以及不同风险,我们使用不同行业的风险评估,我们讨论第六个部门,讨论。