Multi-agent reinforcement learning for incomplete information environments has attracted extensive attention from researchers. However, due to the slow sample collection and poor sample exploration, there are still some problems in multi-agent reinforcement learning, such as unstable model iteration and low training efficiency. Moreover, most of the existing distributed framework are proposed for single-agent reinforcement learning and not suitable for multi-agent. In this paper, we design an distributed MARL framework based on the actor-work-learner architecture. In this framework, multiple asynchronous environment interaction modules can be deployed simultaneously, which greatly improves the sample collection speed and sample diversity. Meanwhile, to make full use of computing resources, we decouple the model iteration from environment interaction, and thus accelerate the policy iteration. Finally, we verified the effectiveness of propose framework in MaCA military simulation environment and the SMAC 3D realtime strategy gaming environment with imcomplete information characteristics.
翻译:多剂强化学习不完全的信息环境已引起研究人员的广泛关注,然而,由于抽样收集缓慢和抽样勘探不力,在多剂强化学习方面仍然存在一些问题,例如模型迭代不稳定和训练效率低;此外,大多数现有的分配框架是针对单一剂强化学习而提出的,不适合多剂机构;在本文件中,我们设计了一个基于行为者-工作-learner结构的分布式MARL框架;在这个框架内,可以同时部署多个非同步环境互动模块,大大改进样本收集速度和样本多样性;同时,为了充分利用计算资源,我们将模型迭代与环境互动脱钩,从而加速政策重复;最后,我们核查了在MACA军事模拟环境中提议的框架的有效性,以及SMAC 3D实时战略组合环境,其信息特征尚不完整。