AI 系统的操纵特征 (Characterizing Manipulation from AI Systems)

Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans \textit{without the intent of the system designers}. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative \textit{if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly}. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.

翻译：操纵是许多领域的共同关注点，例如社交媒体、广告和聊天机器人。随着 AI 系统越来越多地介入我们与世界的交互，了解 AI 系统可能会无意识地操纵人类的程度变得越来越重要。我们的研究阐明了在 AI 系统上定义和衡量操纵的挑战。首先，我们基于先前其他领域关于操纵的文献，特征化了可能的操纵概念，发现它们取决于激励、意图、伤害和隐秘性等概念。我们回顾了有关如何对每个因素进行具体操作的提议。其次，我们提出了一种基于我们特征化的定义操纵的方法：如果系统行为表现为在秘密和有意地追求操纵人类或其他机构的激励，则该系统是操纵性的。第三，我们讨论了操纵和相关概念，如欺骗和强制之间的联系。最后，我们探讨了操纵操作在一些应用中的情境。我们总体评估是，虽然已经在定义和衡量 AI 系统的操纵方面取得了一些进展，但仍存在许多空白。在没有共识性定义和可靠的衡量工具的情况下，我们无法排除 AI 系统学习操纵人类而不是系统设计师的意图的可能性。我们认为这种操纵对人类自主性构成了重大威胁，建议采取预防措施来减轻其影响。

相关内容

关注 7047

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《可解释人工智能的态势感知框架 (SAFE-AI) 和 XAI 系统的人为因素考虑》麻省理工学院17页论文

专知会员服务

105+阅读 · 2023年2月19日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

专知会员服务

26+阅读 · 2020年7月4日