Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans \textit{without the intent of the system designers}. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative \textit{if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly}. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.
翻译:在社会媒体、广告和聊天室等许多领域,操纵是一个共同的关切。随着AI系统更多地调解我们与世界的互动,重要的是要了解AI系统在不考虑系统设计者的意图的情况下可能操纵人类的程度。我们的工作澄清了在定义和衡量AI系统范围内操纵方面的挑战。首先,我们借鉴了以前关于其他领域操纵的文献,并说明了可能的操纵概念的空间,我们发现这些概念取决于激励、意图、伤害和隐蔽的概念。我们审查了如何落实每个因素的建议。第二,我们根据我们的特性提出了操纵的定义:如果系统是故意和隐蔽地寻求激励改变一个人类(或另一个代理人)的动力,那么这个系统就是操纵性的。第三,我们讨论了操纵与诸如欺骗和胁迫等相关概念之间的联系。最后,我们发现,我们在某些应用中操作操作的操作范围以背景为根据。我们的总体评估是,虽然在定义和衡量AI系统的操纵方面取得了一些进展,但许多差距依然存在。我们根据我们的特性提出了操纵定义的定义:如果系统是操纵性的,那么它就是操纵性的,如果它是一种操纵,那么它就是一种操纵的可能性,那么,我们就无法在不进行可靠的操纵中学习一个可靠的工具。</s>