Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards, and has recently emerged as a powerful credit assignment paradigm for tackling cooperative Multi-Agent Reinforcement Learning (MARL) problems. One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks with various strategies. However, we argue that these dedicated designs for agent networks are still limited by the indistinguishable VD network, leading to homogeneous agent behaviors and thus downgrading the cooperation capability. In this paper, we propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network to break the bottleneck of multi-agent diversity. Specifically, our approach leverages contrastive learning to maximize the mutual information between the temporal credits and identity representations of different agents, encouraging the full expressiveness of credit assignment and further the emergence of individualities. The algorithm implementation of the proposed CIA module is simple yet effective that can be readily incorporated into various VD architectures. Experiments on the SMAC benchmarks and across different VD backbones demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/CIA.
翻译:价值分解(VD)旨在推断代理人在只得到全球奖励的情况下对分散政策的贡献,最近成为解决合作性多机构强化学习(MARL)问题的强有力的信用分配模式。VD的主要挑战之一是促进代理人之间的不同行为,而现有的方法直接鼓励不同战略的学术代理人网络的多样性。然而,我们争辩说,代理网络的这些专门设计仍然受到无法区分的VD网络的限制,导致单一的代理人行为,从而降低合作能力。在本文件中,我们提出一种新的差异性身份-软件学习(CIA)方法,明确提高VD网络的信用程度区别性,打破多机构多样性的瓶颈。具体地说,我们的方法利用对比性学习,使不同代理人的短期信贷和身份表现之间的相互信息最大化,鼓励信用分配的完全明确性,进一步出现个体。拟议的CIA模块的算法实施简单而有效,可以很容易地纳入各种VD结构。在VD网络/A软件学习中,明确提升VD网络的信用程度,以打破多机构多样性的瓶颈。我们的方法利用了对比性学习方法,从而最大限度地了解不同代理人之间的信用分配和身份代号。