Utilizing messages from teammates can improve coordination in cooperative Multi-agent Reinforcement Learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this paper, we propose Multi-Agent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multi-agent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this paper as a testbed for communication ability validation to facilitate further future research.
翻译:利用队友的信息可以改进多剂强化合作学习(MARL)的协调。以往的工作通常将队友的原始信息与当地信息相结合,作为政策的投入。然而,忽视信息汇总导致政策学习效率极低。受最近代表性学习进展的推动,我们认为,高效率的信息汇总对于合作多剂强化学习的良好协调至关重要。在本文件中,我们提议通过自我监督信息聚合(MASIA)进行多剂沟通,在此过程中,代理商可将收到的信息汇总为与增强当地政策密切相关的缩略语。具体地说,我们设计了一个变换式信息编码器,以便从信息中产生共同的信息汇总代表,并通过以自我监督的方式重建和拍摄未来信息优化信息。因此,每个代理商将利用综合代表中最相关的部分,通过新的信息提取机制进行决策。此外,考虑到离线学习对现实世界应用的潜力,我们为多剂通信建立了离线基准,这是我们所知道的首例。Epricalalalal 结果表明,我们的方法在在线和离线通信中都具有优势,可以进一步测试。