Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only the joint value function into agent-wise value functions for decentralized execution, but also the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on both single-task and multi-task benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.
翻译:深度协同多智能体强化学习已经在广泛的复杂控制任务中取得了显著的成功。然而,多智能体学习的最近进展主要集中在价值分解上,而实体之间的交互仍然相互交织,这很容易导致在嘈杂的实体之间的交互上过度拟合的问题。在这项工作中,我们引入了一种新颖的交互模式分解(OPT)方法,不仅将联合值函数分解为适用于去中心化执行的代理价值函数,还将实体交互分解为交互原型,其中每个原型代表实体子组内的一种基础交互模式。OPT有助于过滤不相关实体之间的噪声交互,从而显著提高了泛化能力和可解释性。具体而言,OPT引入了一种稀疏的分歧机制,以鼓励发现的交互原型之间的稀疏性和多样性。然后,模型通过可学习权重的聚合器有选择地重构这些原型为紧凑的交互模式。为了减轻局部可观察性导致的培训不稳定性问题,我们建议最大化聚合权重和每个代理的历史行为之间的互信息。单任务和多任务基准测试的实验表明,所提出的方法优于最先进的对手。我们的代码可在 https://github.com/liushunyu/OPT 上获取。