Multi-Agent Reinforcement Learning (MARL) is a promising area of research that can model and control multiple, autonomous decision-making agents. During online training, MARL algorithms involve performance-intensive computations such as exploration and exploitation phases originating from large observation-action space belonging to multiple agents. In this article, we seek to characterize the scalability bottlenecks in several popular classes of MARL algorithms during their training phases. Our experimental results reveal new insights into the key modules of MARL algorithms that limit the scalability, and outline potential strategies that may help address these performance issues.
翻译:多机构强化学习(MARL)是一个大有希望的研究领域,可以建模和控制多个自主决策代理。在在线培训中,MARL算法涉及性能密集的计算,如来自属于多个代理商的大型观测-行动空间的勘探和开发阶段。在本条中,我们试图确定MARL算法在培训阶段的几个流行类别中的可扩缩性瓶颈。我们的实验结果揭示出对限制可扩缩性的MARL算法关键模块的新洞察力,并概述了可能有助于解决这些绩效问题的潜在战略。