Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, and value decomposition) in a diverse range of fully-cooperative multi-agent learning tasks. Our experiments can serve as a reference for the expected performance of algorithms across different learning tasks. We also provide further insight about (1) when independent learning might be surprisingly effective despite non-stationarity, (2) when centralised training should (and shouldn't) be applied and (3) which benefits value decomposition can bring.
翻译:多代理人深层强化学习(MARL)缺乏常用的评价任务和标准,难以对各种方法进行比较;在这项工作中,我们评估和比较了多种全面合作多代理人学习任务中三种不同类别的MARL算法(独立学习、集中多代理人政策梯度和价值分解)。我们的实验可以作为不同学习任务中算法预期业绩的参考。我们还进一步深入了解:(1)独立学习在不固定的情况下可能出乎意料地有效,(2)在集中培训应当(和不应该)应用时,以及(3)利益分解能够带来什么好处。