Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents. In this paper, we propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems. In order to address instabilities that arise from the concurrent optimization of high-level and low-level policies and another concurrent optimization of agents, we introduce the dual coordination mechanism of inter-layer strategies and inter-agent strategies. HAVEN does not require domain knowledge and pretraining at all, and can be applied to any value decomposition variants. Our method is demonstrated to achieve superior results to many baselines on StarCraft II micromanagement tasks and offers an efficient solution to multi-agent hierarchical reinforcement learning in fully cooperative scenarios.
翻译:多剂强化学习往往因大量代理商造成的超大行动空间而受到影响。在本文件中,我们提出一个新的价值分解框架,基于对全面合作的多剂问题进行分级强化学习。为了解决同时优化高层次和低层次政策和另一次同时优化代理商所产生的不稳定问题,我们引入了跨层次战略和机构间战略的双重协调机制。HIN根本不需要域内知识和预培训,可以适用于任何价值分解变异。我们的方法证明在StarCraft II微观管理任务的许多基线上取得了优异的结果,并为在全面合作的情况下多剂强化学习提供了有效的解决方案。