The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve socially beneficial outcomes, even in the face of short-term selfish incentives. In many cases, collective action problems possess an underlying graph structure, whose topology crucially determines the relationship between local decisions and emergent global effects. Such scenarios have received great attention through the lens of network games. However, this abstraction typically collapses important dimensions, such as geometry and time, relevant to the design of mechanisms promoting cooperation. In parallel work, multi-agent deep reinforcement learning has shown great promise in modelling the emergence of self-organized cooperation in complex gridworld domains. Here we apply this paradigm in graph-structured collective action problems. Using multi-agent deep reinforcement learning, we simulate an agent society for a variety of plausible mechanisms, finding clear transitions between different equilibria over time. We define analytic tools inspired by related literatures to measure the social outcomes, and use these to draw conclusions about the efficacy of different environmental interventions. Our methods have implications for mechanism design in both human and artificial agent systems.
翻译:现实世界充斥着需要自利的代理人采取集体行动的多试剂问题,从通过计算机网络的包装路线到灌溉系统的管理等,这些系统对个人具有当地鼓励因素,其行为影响到集团的全球结果;鉴于描述代理人互动的适当机制,即使面对短期自私的刺激,各群体也有可能取得对社会有利的结果;在许多情况下,集体行动问题具有潜在的图表结构,其地形决定着地方决定与新出现的全球影响之间的关系。这些假设通过网络游戏的视角得到了极大关注。然而,这种抽象化通常会破坏重要的方面,例如几何和时间等,与合作机制的设计有关。在平行工作中,多试剂深度强化学习在模拟复杂的电网域内自我组织合作的出现方面显示了巨大的希望。我们在这里将这种模式应用于图形结构化的集体行动问题。利用多剂深度强化学习,我们模拟一个代理人社会,从各种可信的机制中找到不同时间的平衡之间的明显转变。我们界定了与合作机制设计有关的地貌和时间等分析工具。在相关的设计中,我们用不同的设计工具来测量这些设计方法的人类设计结果。