社会世界模型增强的机制设计策略学习 (Social World Model-Augmented Mechanism Design Policy Learning)

Designing adaptive mechanisms to align individual and collective interests remains a central challenge in artificial social intelligence. Existing methods often struggle with modeling heterogeneous agents possessing persistent latent traits (e.g., skills, preferences) and dealing with complex multi-agent system dynamics. These challenges are compounded by the critical need for high sample efficiency due to costly real-world interactions. World Models, by learning to predict environmental dynamics, offer a promising pathway to enhance mechanism design in heterogeneous and complex systems. In this paper, we introduce a novel method named SWM-AP (Social World Model-Augmented Mechanism Design Policy Learning), which learns a social world model hierarchically modeling agents' behavior to enhance mechanism design. Specifically, the social world model infers agents' traits from their interaction trajectories and learns a trait-based model to predict agents' responses to the deployed mechanisms. The mechanism design policy collects extensive training trajectories by interacting with the social world model, while concurrently inferring agents' traits online during real-world interactions to further boost policy learning efficiency. Experiments in diverse settings (tax policy design, team coordination, and facility location) demonstrate that SWM-AP outperforms established model-based and model-free RL baselines in cumulative rewards and sample efficiency.

翻译：设计适应性机制以协调个体与集体利益，仍是人工社会智能领域的核心挑战。现有方法通常在建模具有持久潜在特质（如技能、偏好）的异质智能体以及处理复杂多智能体系统动态方面存在困难。由于现实世界交互成本高昂，对高样本效率的迫切需求进一步加剧了这些挑战。世界模型通过学习预测环境动态，为在异质复杂系统中增强机制设计提供了一条有前景的路径。本文提出了一种名为SWM-AP（社会世界模型增强的机制设计策略学习）的新方法，该方法通过分层建模智能体行为来学习一个社会世界模型，以增强机制设计。具体而言，社会世界模型从智能体的交互轨迹中推断其特质，并学习一个基于特质的模型来预测智能体对已部署机制的反应。机制设计策略通过与社会世界模型交互收集大量训练轨迹，同时在现实世界交互过程中在线推断智能体特质，以进一步提升策略学习效率。在多种场景（税收政策设计、团队协调和设施选址）中的实验表明，SWM-AP在累积奖励和样本效率方面均优于现有的基于模型和无模型的强化学习基线方法。