Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing. This intensifies the need to ensure that models are aligned with human preferences and do not produce unsafe, inaccurate or toxic outputs. While alignment techniques like reinforcement learning with human feedback (RLHF) and red-teaming can mitigate some safety concerns and improve model capabilities, it is unlikely that an aggregate fine-tuning process can adequately represent the full range of users' preferences and values. Different people may legitimately disagree on their preferences for language and conversational norms, as well as on values or ideologies which guide their communication. Personalising LLMs through micro-level preference learning processes may result in models that are better aligned with each user. However, there are several normative challenges in defining the bounds of a societally-acceptable and safe degree of personalisation. In this paper, we ask how, and in what ways, LLMs should be personalised. First, we review literature on current paradigms for aligning LLMs with human feedback, and identify issues including (i) a lack of clarity regarding what alignment means; (ii) a tendency of technology providers to prescribe definitions of inherently subjective preferences and values; and (iii) a 'tyranny of the crowdworker', exacerbated by a lack of documentation in who we are really aligning to. Second, we present a taxonomy of benefits and risks associated with personalised LLMs, for individuals and society at large. Finally, we propose a three-tiered policy framework that allows users to experience the benefits of personalised alignment, while restraining unsafe and undesirable LLM-behaviours within (supra-)national and organisational bounds.
翻译:大型语言模型(LLMS)被用于产生内容,用于开展范围广泛的任务,并且在未来几年里,由于产品界面(如ChatGPT)或Bing等搜索引擎(如Bing)的整合,大型语言模型(LLMS)将在未来几年里接触到越来越多的受众。这强化了需要确保模型与人类喜好相一致,而不是产生不安全、不准确或有毒的产出。虽然通过人类反馈和红色编织等强化学习的方法可以减轻一些安全关切,并提高模型能力,但总体微调过程不大可能充分代表用户的全部偏好和价值观。不同的人可能合法地对语言和谈话规范以及指导其交流的价值观或意识形态持有不同意见。这强化了需要确保模型与人类喜好相一致,而不是产生不安全、不准确或有毒的产出。 然而,在确定社会可接受和安全程度的界限方面,有一些规范性的挑战。 在本文件中,LMS(我们问如何、以何种方式、以何种方式,LMS(我们)的关联性框架可以个性化。首先,我们审查关于当前文献模式的文献,以便让LMS-Ms(ii)与人类的用户之间的大量反馈、以及理性定义的精细化定义和直观定义(我们缺乏一个社会上的一个工具)的精细化工具,并查明一个社会上如何的精度问题。</s>