Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we consistently evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase~\citep{samvelyan19smac} to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.
翻译:多剂深度强化学习(MARL)缺乏常用的评价任务和标准,难以对各种方法进行比较;在这项工作中,我们不断评估和比较不同系列合作多剂学习任务的三种不同类别的MARL算法(独立学习、集中多剂政策梯度、价值分解)。我们的实验是不同学习任务的算法预期业绩的参考,我们为不同学习方法的有效性提供了见解。我们开放源码的EPyMARL将PyMARLcodebase{citep{samvelyan19smac}扩展为包括额外算法,并允许灵活配置参数共享等算法实施细节。最后,我们为多剂研究开辟了两种环境,侧重于在微薄报酬下的协调。