Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only the joint value function into agent-wise value functions for decentralized execution, but also the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on both single-task and multi-task benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.
翻译:然而,最近多试剂学习的进展主要侧重于价值分解,而使实体的互动仍然相互交织,这很容易导致各实体之间过于紧张的互动。在这项工作中,我们引入了一种新型互动互动方式,不仅将联合价值功能分解为分散执行的代理-明智价值功能,而且还将实体互动分为互动模式,每个模式都代表各实体分组内的一种基本互动模式。被占领土促进过滤不相关实体之间的吵闹互动,从而大大改善通用性和可解释性。具体地说,被占领土引入了一种细小的分歧机制,鼓励已发现的互动模式之间的宽度和多样性。然后,模型有选择地将这些模式重组为由具有可学习权重的聚合器组成的一种紧密互动模式。为了缓解因部分可观察性而造成的培训不稳定问题,我们提议最大限度地扩大每个代理方的聚合权重和历史行为之间的相互信息。在单式任务组合和多任务系统之间进行实验,从而极大地改进了通用性和可解释性。</s>