Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.
翻译:多剂强化学习(MARL)要求协调,以有效解决某些任务。由于联合行动空间的规模,完全集中的控制在这类领域往往不可行。基于协调图的正式化使得根据互动结构对联合行动进行推理,然而,在设计时往往需要领域专长。本文件介绍了这种情景的深度隐含协调图(DICG)结构。DICG包含一个模块,用于推断动态协调图结构,该模型随后由一个基于图形神经网络模块的神经网络用于学习对联合行动或价值观的隐含理由。DICG允许通过标准的行为者-批评方法了解全面集中与分散之间的平衡,以大大改进与大量代理的领域的协调。我们将DICG应用于集中化-集中化执行和集中化-培训-分散化-执行系统。我们表明,DICG解决了捕食性任务中相对过于概括化的病理学,并超越了具有挑战性的StarCraft II多剂挑战(SMAC)和交通十字路口环境的各种MAL基线。