Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
翻译:在深层强化学习(RL)方面最近的进展导致许多玩家零和游戏(Go、Poker和Starcraft等)取得了相当大的进展。这种游戏的纯粹对抗性性质使得在概念上简单而有原则地应用RL方法。然而,现实世界的设置是多方面的,代理相互作用是共同利益和竞争方面的复杂组合。我们认为,外交是一个七人游戏板游戏,旨在加重许多试剂相互作用造成的困境。它还具有大型组合动作空间和同步动作,这对RL算法具有挑战性。我们建议了一个简单而有效的最佳反应操作器,旨在处理大型组合动作空间和同步动作。我们还引入了一套政策推介方法,近似于虚构游戏。我们成功地将RL应用于外交:我们显示,我们的代理器令人信服地超越了先前的状态,游戏平衡分析显示,新的过程产生了一致的改进。