Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in homogeneous scenarios. However, heterogeneous scenarios are also very common and usually harder to solve. In this paper, we mainly discuss cooperative heterogeneous MARL problems in Starcraft Multi-Agent Challenges (SMAC) environment. We firstly define and describe the heterogeneous problems in SMAC. In order to comprehensively reveal and study the problem, we make new maps added to the original SMAC maps. We find that baseline algorithms fail to perform well in those heterogeneous maps. To address this issue, we propose the Grouped Individual-Global-Max Consistency (GIGM) and a novel MARL algorithm, Grouped Hybrid Q Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group, along with a novel hybrid structure for factorization. To enhance coordination between groups, we maximize the Inter-group Mutual Information (IGMI) between groups' trajectories. Experiments on original and new heterogeneous maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.
翻译:先前的多试剂强化学习(MARL)算法取得了令人印象深刻的成果,通常是在同质情景中。然而,各种设想方案也非常常见,通常难以解决。在本文件中,我们主要讨论星际多动挑战(SMAC)环境中合作的多元MARL问题。我们首先界定和描述SMAC(SMAC)中的多种问题。为了全面揭示和研究这一问题,我们制作了新的地图,在SMAC原始地图中添加了新的地图。我们发现,这些不同的地图中的基线算法效果不佳。为了解决这一问题,我们提议了个人-全球统一(GIGM)和新的MARL算法(GHQ)。GHQ(GQ)。GHQ)将每个组的代理人分成若干组,并保持每个组的个别参数,同时保持一种新的因素化混合结构。为了加强各组之间的协调,我们尽量扩大各组之间轨迹之间的群体间相互信息。对原始和新混合地图的实验显示GHQ与其他状态算法的惊人性。</s>