Learning to coordinate among multiple agents is an essential problem in multi-agent systems. Multi-agent reinforcement learning has long been a go-to tool in the complicated collaborative environment. However, most existing works are constrained by the assumption that all agents take actions simultaneously. In this paper, we endow the hierarchical order of play for the agents through electing a first-move agent and other agents take the best response to the first-move agent to obtain better coordination. We propose the algorithm EFA-DQN to implicitly model the coordination and learn the coordinated behavior in multi-agent systems. To verify the feasibility and demonstrate the effectiveness and efficiency of our algorithm, we conduct extensive experiments on several multi-agent tasks with different numbers of agents: Cooperative Navigation, Physical Deception, and The Google Football. The empirical results across the various scenarios show that our method achieves competitive advantages in terms of better performance and faster convergence, which demonstrates that our algorithm has broad prospects for addressing many complex real-world problems.
翻译:多剂强化学习长期以来一直是复杂协作环境中的一个工具。然而,大多数现有工作都受到所有代理同时采取行动的假设的限制。在本文件中,我们通过选择第一移动代理和其他代理对第一移动代理作出最佳反应,以获得更好的协调,赋予代理的等级顺序。我们建议“全民教育-DQN”算法在多剂系统中隐含协调模式,并学习协调行为。为了核实我们算法的可行性并展示我们算法的实效和效率,我们广泛试验了由不同代理商组成的多个多剂任务:合作导航、物理欺骗和谷歌足球。各种情景的经验结果表明,我们的方法在改进业绩和更快的趋同方面都具有竞争优势,这表明我们的算法在解决许多复杂的现实世界问题方面有着广阔的前景。