Learning to collaborate is critical in Multi-Agent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents' behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.
翻译:在多机构强化学习(MARL)中,学习协作至关重要。 以往的工作通过最大限度地提高代理人行为(通常以不同形式以相互信息为特征)的相互关系促进协作。 然而,我们揭示出与强烈关联相关的次最佳合作行为。 仅仅最大限度地扩大管理管理可以令人惊讶地阻碍学习更好的合作。 为解决这一问题,我们提议了一个名为进步相互信息协作(PMIC)的新型MARL框架,以更有效地开展MI驱动的合作。 PMI使用由管理所测量的全球国家间新合作标准以及联合行动。 根据这一标准,PMIC的关键理念是最大限度地扩大与高级合作行为相关的管理,并最大限度地减少与低级合作行为相关的管理。 两项管理目标通过促进更好的合作而避免陷入亚优合作而发挥互补作用。 对广泛的MARL基准的实验表明PMIC与其他算法相比,PMIC业绩优于其他算法。