关于用人类反馈加强学习的社会影响的观点</s> (Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback)

Is it possible for machines to think like humans? And if it is, how should we go about teaching them to do so? As early as 1950, Alan Turing stated that we ought to teach machines in the way of teaching a child. Reinforcement learning with human feedback (RLHF) has emerged as a strong candidate toward allowing agents to learn from human feedback in a naturalistic manner. RLHF is distinct from traditional reinforcement learning as it provides feedback from a human teacher in addition to a reward signal. It has been catapulted into public view by multiple high-profile AI applications, including OpenAI's ChatGPT, DeepMind's Sparrow, and Anthropic's Claude. These highly capable chatbots are already overturning our understanding of how AI interacts with humanity. The wide applicability and burgeoning success of RLHF strongly motivate the need to evaluate its social impacts. In light of recent developments, this paper considers an important question: can RLHF be developed and used without negatively affecting human societies? Our objectives are threefold: to provide a systematic study of the social effects of RLHF; to identify key social and ethical issues of RLHF; and to discuss social impacts for stakeholders. Although text-based applications of RLHF have received much attention, it is crucial to consider when evaluating its social implications the diverse range of areas to which it may be deployed. We describe seven primary ways in which RLHF-based technologies will affect society by positively transforming human experiences with AI. This paper ultimately proposes that RLHF has potential to net positively impact areas of misinformation, AI value-alignment, bias, AI access, cross-cultural dialogue, industry, and workforce. As RLHF raises concerns that echo those of existing AI technologies, it will be important for all to be aware and intentional in the adoption of RLHF.

翻译：机器能否像人类那样思考?如果是这样的话,我们应如何去教机器呢?早在1950年,艾伦·图灵就表示我们应该教机器教孩子。用人类反馈加强学习(RLHF)已经成为一个强大的候选者,让代理者以自然主义的方式学习人类反馈。RLHF与传统的强化学习截然不同,因为它除了提供奖励信号外,还提供来自人类教师的反馈。许多高知名度的AI应用,包括OpenAI的ChatGPT、DeepMind的Sparrow和Anthropic的Claude,都说我们应该教机器教机器教孩子。这些高能的聊天机已经让我们过度理解AI与人类互动的方式。RLF的广度适用性和快速成功强烈地激发了评估其社会影响的必要性。根据最近的发展,本文认为一个重要问题:RLHF的开发和使用能否在不影响人类社会潜力的情况下,我们的目标有三重:系统研究RLFFFMF的社会影响;RHFAs的运用方式使得其关键的社会和伦理问题成为人们的视野。</s>