Role-based learning is a promising approach to improving the performance of Multi-Agent Reinforcement Learning (MARL). Nevertheless, without manual assistance, current role-based methods cannot guarantee stably discovering a set of roles to effectively decompose a complex task, as they assume either a predefined role structure or practical experience for selecting hyperparameters. In this article, we propose a mathematical Structural Information principles-based Role Discovery method, namely SIRD, and then present a SIRD optimizing MARL framework, namely SR-MARL, for multi-agent collaboration. The SIRD transforms role discovery into a hierarchical action space clustering. Specifically, the SIRD consists of structuralization, sparsification, and optimization modules, where an optimal encoding tree is generated to perform abstracting to discover roles. The SIRD is agnostic to specific MARL algorithms and flexibly integrated with various value function factorization approaches. Empirical evaluations on the StarCraft II micromanagement benchmark demonstrate that, compared with state-of-the-art MARL algorithms, the SR-MARL framework improves the average test win rate by 0.17%, 6.08%, and 3.24%, and reduces the deviation by 16.67%, 30.80%, and 66.30%, under easy, hard, and super hard scenarios.
翻译:基于角色的学习是改善多智能体强化学习性能的一种有前途的方法。然而,当前的基于角色的方法在没有手动辅助的情况下,不能保证稳定地发现一组角色来有效地分解复杂的任务,因为它们要么假设预定义的角色结构,要么假设通过实际经验选择超参数。在本文中,我们提出了一种基于数学结构信息原则的角色发现方法,即SIRD,并提出了一种优化MARL框架,即SR-MARL,用于多智能体协作。SIRD将角色发现转化为一种分层的行动空间聚类。具体来说,SIRD包括结构化、稀疏化和优化模块,其中生成一个最优的编码树来执行抽象以发现角色。SIRD对特定的MARL算法不感知,并灵活地集成各种值函数分解方法。基于StarCraft II微观管理基准的实证评估表明,与最先进的MARL算法相比,SR-MARL框架改善了易、困难和超难的场景下的平均测试胜率分别为0.17%、6.08%和3.24%,并减少了分别为16.67%、30.80%和66.30%的偏差。