Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings. In parallel, these improvements have prompted a heated discourse around the risks of societal harms they introduce, whether inadvertent or malicious. Several studies have explored these harms and called for their mitigation via development of safer, fairer models. Going beyond enumerating the risks of harms, this work provides a survey of practical methods for addressing potential threats and societal harms from language generation models. We draw on several prior works' taxonomies of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators. Bridging diverse strands of research, this survey aims to serve as a practical guide for both LM researchers and practitioners, with explanations of different mitigation strategies' motivations, their limitations, and open problems for future research.
翻译:最近大型语言模型生成类似文字的能力有所提高,导致这些模型在以用户为对象的环境中更多地被采用,同时,这些改进促使人们围绕这些模型带来的社会危害,无论是无意的还是恶意的,围绕这些危害的风险进行了热烈的讨论。一些研究探讨了这些危害,并呼吁通过开发更安全、更公平的模型来减轻这些危害。除了列举伤害风险之外,这项工作还调查了解决语言生成模型的潜在威胁和社会危害的实用方法。我们利用了先前若干语言模型风险的分类,对发现和缓解不同类型语言生成者的风险/伤害的战略进行了结构化的概述。本调查旨在将各种研究进行整合,作为LM研究人员和从业人员的实用指南,并解释不同的缓解战略的动机、局限性和今后研究的公开问题。