Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only the joint value function into agent-wise value functions for decentralized execution, but also the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a sub-group of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on both single-task and multi-task benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code will be made publicly available.
翻译:然而,最近多试剂学习的进展主要侧重于价值分解,而使实体的互动仍然相互交织,这很容易导致各实体之间过于紧张的互动。在这项工作中,我们引入了一种新型互动模式脱钩法,不仅将联合价值功能分解为分散执行的代理价值功能,而且将实体互动分解为互动模式,每个模式都代表各实体分组内的一种基本互动模式。被占领土有利于过滤相关实体之间的吵闹互动,从而大大改善通用性和可解释性。具体地说,被占领土引入了一种稀疏的分歧机制,鼓励已发现的互动模式之间的分散性和多样性。然后,模型有选择地将这些模式重组为由具有可学习权重的聚合器组成的紧密互动模式。为了减轻因部分不耐性而造成的培训不稳定问题,我们提议尽量扩大每个代理商的组合权重和历史行为之间的相互信息。在单项任务和多项任务标准之间进行实验,我们提出的标准将展示为可公开要求的州标准。