Recent years have seen revolutionary breakthroughs in the field of Multi-Agent Deep Reinforcement Learning (MADRL), with its successful applications to various complex scenarios such as computer games and robot swarms. We investigate the impact of "implementation tricks" of state-of-the-art (SOTA) QMIX-based algorithms. First, we find that applied tricks that are described as auxiliary details to the core algorithm, seemingly of secondary importance, in fact, have an enormous impact on performance. Our finding demonstrates that, after minimal tuning, QMIX attains extraordinarily high win rates and achieves SOTA in the StarCraft Multi-Agent Challenge (SMAC). Furthermore, we find QMIX's monotonicity constraint improves sample efficiency in certain cooperative tasks. We propose a new policy-based algorithm to verify the importance of the monotonicity constraint: RIIT. RIIT successfully achieves SOTA in policy-based algorithms. Finally, we prove that the Purely Cooperative Tasks can be represented by the monotonic mixing networks. We open-source the code at \url{https://github.com/hijkzzz/pymarl2}.
翻译:近些年来,在多代理深层强化学习(MADRL)领域出现了革命性突破,在计算机游戏和机器人群等各种复杂情景中成功地应用了这种革命性突破。我们调查了基于QMIX的算法“应用技巧”的影响。首先,我们发现,在核心算法中被描述为辅助细节的运用技巧实际上似乎具有次要重要性,对业绩产生了巨大影响。我们的发现表明,在微调后,QMIX取得了超高的赢率,并在StarCraft多位挑战(SMAC)中实现了SOTA。此外,我们发现QMIX的单一性约束提高了某些合作任务的样本效率。我们提出了一个新的基于政策性的算法,以核实单一性制约的重要性:RIIT. RIIT成功地在基于政策的算法中实现了SOTA。最后,我们证明,Purely合作任务可以由单调混合网络来代表。WEOFORL2/QUMARM}/SUGUMARMS。