In most existing studies on large-scale multi-agent coordination, the control methods aim to learn discrete policies for agents with finite choices. They rarely consider selecting actions directly from continuous action spaces to provide more accurate control, which makes them unsuitable for more complex tasks. To solve the control issue due to large-scale multi-agent systems with continuous action spaces, we propose a novel MARL coordination control method that derives stable continuous policies. By optimizing policies with maximum entropy learning, agents improve their exploration in execution and acquire an excellent performance after training. We also employ hierarchical graph attention networks (HGAT) and gated recurrent units (GRU) to improve the scalability and transferability of our method. The experiments show that our method consistently outperforms all baselines in large-scale multi-agent cooperative reconnaissance tasks.
翻译:在大多数关于大型多试剂协调的现有研究中,控制方法旨在为有有限选择的代理商学习不同的政策,很少考虑从连续行动空间直接选择行动,以提供更准确的控制,从而使它们不适于执行更复杂的任务。为了解决由于大规模多试剂系统造成的控制问题,我们提出了具有连续行动空间的新型MARL协调控制方法,该方法可以产生稳定的连续政策。通过优化政策,最大限度地学习英吉利,代理商改进了执行过程中的探索,并在培训后取得了优异的绩效。我们还采用分级图形关注网络和封闭的经常性单位(GRU)来改进我们方法的可扩展性和可转移性。实验表明,我们的方法始终超越了大规模多试剂合作性侦察任务中的所有基线。