Most existing multi-agent reinforcement learning (MARL) methods are limited in the scale of problems they can handle. Particularly, with the increase of the number of agents, their training costs grow exponentially. In this paper, we address this limitation by introducing a scalable MARL method called Distributed multi-Agent Reinforcement Learning with One-hop Neighbors (DARL1N). DARL1N is an off-policy actor-critic method that breaks the curse of dimensionality by decoupling the global interactions among agents and restricting information exchanges to one-hop neighbors. Each agent optimizes its action value and policy functions over a one-hop neighborhood, significantly reducing the learning complexity, yet maintaining expressiveness by training with varying numbers and states of neighbors. This structure allows us to formulate a distributed learning framework to further speed up the training procedure. Comparisons with state-of-the-art MARL methods show that DARL1N significantly reduces training time without sacrificing policy quality and is scalable as the number of agents increases.
翻译:大部分现有的多剂强化学习方法(MARL)在其所能处理的问题规模上受到限制。 特别是,随着代理商数量的增加,其培训费用会急剧增长。 在本文件中,我们通过采用一种可扩缩的MARL方法,即“分散式多剂强化学习,使用单点邻居(DARL1N)”来应对这一限制。 DARL1N是一种脱离政策的行动方-批评方法,它通过将代理商之间的全球互动脱钩并将信息交流限制在一角邻居,打破了对维度的诅咒。 每个代理商将其行动价值和政策功能优化到一角邻居,大大降低了学习的复杂性,但通过培训以不同数量和不同邻居的状态保持清晰度。这一结构使我们能够制定一个分布式学习框架,以进一步加快培训程序。 与最先进的MARL1N方法的比较表明,DARL1N在不牺牲政策质量的情况下大大缩短了培训时间,并且随着代理商数量的增加而可以推广。