Multi-agent reinforcement learning tasks put a high demand on the volume of training samples. Different from its single-agent counterpart, distributed value-based multi-agent reinforcement learning faces the unique challenges of demanding data transfer, inter-process communication management, and high requirement of exploration. We propose a containerized learning framework to solve these problems. We pack several environment instances, a local learner and buffer, and a carefully designed multi-queue manager which avoids blocking into a container. Local policies of each container are encouraged to be as diverse as possible, and only trajectories with highest priority are sent to a global learner. In this way, we achieve a scalable, time-efficient, and diverse distributed MARL learning framework with high system throughput. To own knowledge, our method is the first to solve the challenging Google Research Football full game $5\_v\_5$. On the StarCraft II micromanagement benchmark, our method gets $4$-$18\times$ better results compared to state-of-the-art non-distributed MARL algorithms.
翻译:多剂强化学习任务对培训样本数量的需求很大。 不同于其单一试剂对口单位,分布式基于价值的多剂强化学习面临着要求数据传输、流程间通信管理和高勘探要求等独特挑战。 我们建议了一个集装箱化学习框架来解决这些问题。 我们收集了多个环境实例、一个当地学习者和缓冲器,以及一个精心设计的多管队管理器,避免被困在集装箱中。 鼓励每个集装箱的当地政策尽可能多样化,只有最优先的轨迹才能被送到全球学习者手中。 这样,我们就能实现一个可扩展的、有时间效率的、分布式的、具有高系统吞吐量的MARL学习框架。 要掌握知识,我们的方法首先解决具有挑战性的谷歌研究足球全局游戏 5 ⁇ v ⁇ 5美元。 在StarCraft II微观管理基准上,我们的方法比最先进的非分配式MAL算法得到4美元-18美元更好的结果。