Learning to coordinate actions among agents is essential in complicated multi-agent systems. Prior works are constrained mainly by the assumption that all agents act simultaneously, and asynchronous action coordination between agents is rarely considered. This paper introduces a bi-level multi-agent decision hierarchy for coordinated behavior planning. We propose a novel election mechanism in which we adopt a graph convolutional network to model the interaction among agents and elect a first-move agent for asynchronous guidance. We also propose a dynamically weighted mixing network to effectively reduce the misestimation of the value function during training. This work is the first to explicitly model the asynchronous multi-agent action coordination, and this explicitness enables to choose the optimal first-move agent. The results on Cooperative Navigation and Google Football demonstrate that the proposed algorithm can achieve superior performance in cooperative environments. Our code is available at \url{https://github.com/Amanda-1997/EFA-DWM}.
翻译:在复杂的多试剂系统中,必须学会在代理人之间协调行动。以前的工作主要受到以下假设的限制:所有代理人同时行动,代理人之间的不同步行动协调很少得到考虑。本文件为协调一致的行为规划引入了双级多代理人决策等级。我们提议了一个新的选举机制,在这种机制中,我们采用一个图形革命网络,以模拟代理人之间的互动,并选出一个非同步指导的第一运动代理人。我们还提议了一个动态加权混合网络,以有效减少培训期间对价值功能的误估。这项工作是第一个明确模拟非同步多代理人行动协调的工作,这种明确性使得能够选择最佳的第一运动代理人。关于合作导航和谷歌足球的结果表明,拟议的算法可以在合作环境中取得优异的性能。我们的代码可以在\url{https://github.com/Amanda-97/EFA-DWM}查阅。