Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we provide a systematic evaluation and comparison of three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.
翻译:多剂深层强化学习(MARL)缺乏常用的评价任务和标准,使得难以对各种方法进行比较;在这项工作中,我们对多种合作多剂学习任务中MARL算法的三个不同类别(独立学习、集中多剂政策梯度、价值分解)进行系统评价和比较;我们的实验为不同学习任务中算法的预期表现提供了参考;我们为不同学习方法的有效性提供了见解;我们开放源码 EPYMARL扩大了PyMARL代码库,以包括额外的算法,并允许对参数共享等算法实施细节进行灵活配置;最后,我们为多剂研究提供了两个来源环境,侧重于在微量报酬下的协调。