Communication is one of the core components for cooperative multi-agent reinforcement learning (MARL). The communication bandwidth, in many real applications, is always subject to certain constraints. To improve communication efficiency, in this article, we propose to simultaneously optimize whom to communicate with and what to communicate for each agent in MARL. By initiating the communication between agents with a directed complete graph, we propose a novel communication model, named Communicative Graph Information Bottleneck Network (CGIBNet), to simultaneously compress the graph structure and the node information with the graph information bottleneck principle. The graph structure compression is designed to cut the redundant edges for determining whom to communicate with. The node information compression aims to address the problem of what to communicate via learning compact node representations. Moreover, CGIBNet is the first universal module for bandwidth-constrained communication, which can be applied to various training frameworks (i.e., policy-based and value-based MARL frameworks) and communication modes (i.e., single-round and multi-round communication). Extensive experiments are conducted in Traffic Control and StarCraft II environments. The results indicate that our method can achieve better performance in bandwidth-constrained settings compared with state-of-the-art algorithms, especially for large-scale multi-agent tasks.
翻译:在许多实际应用中,通信带宽始终受到某些限制。为了提高通信效率,在本条中,我们提议同时优化与MARL中每个代理商的通信和通信方式。通过在代理商之间启动具有定向完整图解的通信,我们提议了一个新型通信模式,名为通信图信息瓶颈网络(CGIBNet),以同时压缩图形结构和节点信息与图形信息瓶颈原则。图形结构压缩旨在切断确定与谁沟通的冗余边缘。节点信息压缩的目的是解决通过学习紧凑节点演示进行沟通的问题。此外,CGIBNet是带宽限制通信的第一个通用模块,可用于各种培训框架(即政策基框架和以价值为基础的MARL框架)和通信模式(即单轮和多轮通信原则)。在交通控制与StarCraft II环境中进行了广泛的实验。结果显示,我们的方法能够与州级、尤其是州级、级、级、高级、高级、高级、高级、高级、高级的演算系统任务实现更好的业绩。