Communication lays the foundation for human cooperation. It is also crucial for multi-agent cooperation. However, existing work focuses on broadcast communication, which is not only impractical but also leads to information redundancy that could even impair the learning process. To tackle these difficulties, we propose Individually Inferred Communication (I2C), a simple yet effective model to enable agents to learn a prior for agent-agent communication. The prior knowledge is learned via causal inference and realized by a feed-forward neural network that maps the agent's local observation to a belief about who to communicate with. The influence of one agent on another is inferred via the joint action-value function in multi-agent reinforcement learning and quantified to label the necessity of agent-agent communication. Furthermore, the agent policy is regularized to better exploit communicated messages. Empirically, we show that I2C can not only reduce communication overhead but also improve the performance in a variety of multi-agent cooperative scenarios, comparing to existing methods. The code is available at https://github.com/PKU-AI-Edge/I2C.
翻译:通信是人类合作的基础,也是多代理人合作的基础。然而,现有工作的重点是广播通信,这不仅不切实际,而且导致信息冗余,甚至可能妨碍学习过程。为了解决这些困难,我们提议个人推断通信(I2C),这是一个简单而有效的模式,使代理人能够学习代理人-代理人通信的先导。先前的知识是通过因果推断获得的,并通过一个向导神经网络获得的,该网络将代理人的当地观测映射为与谁进行通信的信念。一个代理人对另一个代理人的影响通过多代理人强化学习中的联合行动-价值函数推断出来,并量化地标出代理人-代理人通信的必要性。此外,该代理政策定期化,以更好地利用所传递的信息。我们很自然地表明,I2C不仅可以减少通信的间接费用,还可以提高多种代理人合作情景的性能,与现有方法相比较。该代码见https://github.com/PKU-AI-Edge/I2C。