Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans \textit{without the intent of the system designers}. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative \textit{if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly}. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.
翻译:操纵是许多领域的共同关注点,例如社交媒体、广告和聊天机器人。随着 AI 系统越来越多地介入我们与世界的交互,了解 AI 系统可能会无意识地操纵人类的程度变得越来越重要。我们的研究阐明了在 AI 系统上定义和衡量操纵的挑战。首先,我们基于先前其他领域关于操纵的文献,特征化了可能的操纵概念,发现它们取决于激励、意图、伤害和隐秘性等概念。我们回顾了有关如何对每个因素进行具体操作的提议。其次,我们提出了一种基于我们特征化的定义操纵的方法:如果系统行为表现为在秘密和有意地追求操纵人类或其他机构的激励,则该系统是操纵性的。第三,我们讨论了操纵和相关概念,如欺骗和强制之间的联系。最后,我们探讨了操纵操作在一些应用中的情境。我们总体评估是,虽然已经在定义和衡量 AI 系统的操纵方面取得了一些进展,但仍存在许多空白。在没有共识性定义和可靠的衡量工具的情况下,我们无法排除 AI 系统学习操纵人类而不是系统设计师的意图的可能性。我们认为这种操纵对人类自主性构成了重大威胁,建议采取预防措施来减轻其影响。