Equilibrium selection in multi-agent games refers to the problem of selecting a Pareto-optimal equilibrium. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address suboptimal equilibrium selection, we propose Pareto-AC (PAC), an actor-critic algorithm that utilises a simple principle of no-conflict games (a superset of cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. We evaluate PAC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to alternative MARL algorithms, as well as successfully converging to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose a graph neural network extension which is shown to efficiently scale in games with up to 15 agents.
翻译:多试玩游戏中的平衡选择指的是选择 Pareto- 最佳平衡的问题。 已经显示, 许多最先进的多剂强化学习( MARL) 算法容易与 Pareto 占主导地位的平衡( MARL) 相融合, 因为每个代理商在训练期间对其它代理商的政策具有不确定性。 为了解决亚最佳平衡选择, 我们提议 Pareto- AC (PAC), 这是一种使用不冲突游戏简单原则( 合作游戏的超级奖赏相同) 的演算法( PAC ) : 每个代理商可以假设其他人会选择导致Pareto- 优化平衡的行动。 我们用多种试算游戏来评估 PAC 。 我们用多种多剂游戏来评估它, 并显示它会与替代 MARL 算法相比, 并成功地连接到 Pareto- 最佳平衡 。 最后, 我们提议在一系列矩阵游戏中, 将图形神经网络扩展显示在与 15 代理商的游戏中高效规模 。