From social networks to traffic routing, artificial learning agents are playing a central role in modern institutions. We must therefore understand how to leverage these systems to foster outcomes and behaviors that align with our own values and aspirations. While multiagent learning has received considerable attention in recent years, artificial agents have been primarily evaluated when interacting with fixed, non-learning co-players. While this evaluation scheme has merit, it fails to capture the dynamics faced by institutions that must deal with adaptive and continually learning constituents. Here we address this limitation, and construct agents ("mechanisms") that perform well when evaluated over the learning trajectory of their adaptive co-players ("participants"). The algorithm we propose consists of two nested learning loops: an inner loop where participants learn to best respond to fixed mechanisms; and an outer loop where the mechanism agent updates its policy based on experience. We report the performance of our mechanism agents when paired with both artificial learning agents and humans as co-players. Our results show that our mechanisms are able to shepherd the participants strategies towards favorable outcomes, indicating a path for modern institutions to effectively and automatically influence the strategies and behaviors of their constituents.
翻译:从社交网络到交通路线,人工学习代理机构在现代机构中发挥着核心作用。 因此,我们必须理解如何利用这些系统来促进符合我们自身价值观和愿望的成果和行为。 虽然近年来多剂学习受到相当重视, 但当与固定的、非学习的共同玩家互动时,人造代理机构主要受到评估。 虽然这一评估计划有其优点,但它未能捕捉必须处理适应性和持续学习成分的机构所面临的动态。 在这里,我们解决了这一局限性,建设代理机构(“机械”)在评估其适应性共同玩家(“参与者”)的学习轨迹时表现良好。 我们提出的算法包括两个嵌入式学习循环:一个内部循环,参与者学习如何最好地应对固定机制;另一个外部循环,机制代理机构根据经验更新其政策。我们报告机制代理机构在与人工学习代理和人类作为共同玩家同时进行配合时的表现。 我们的结果表明,我们的机制能够引导参与者的战略走向有利的结果, 指明现代机构如何有效和自动影响其选民的战略和行为。