We consider the problem of the limited-bandwidth communication for multi-agent reinforcement learning, where agents cooperate with the assistance of a communication protocol and a scheduler. The protocol and scheduler jointly determine which agent is communicating what message and to whom. Under the limited bandwidth constraint, a communication protocol is required to generate informative messages. Meanwhile, an unnecessary communication connection should not be established because it occupies limited resources in vain. In this paper, we develop an Informative Multi-Agent Communication (IMAC) method to learn efficient communication protocols as well as scheduling. First, from the perspective of communication theory, we prove that the limited bandwidth constraint requires low-entropy messages throughout the transmission. Then inspired by the information bottleneck principle, we learn a valuable and compact communication protocol and a weight-based scheduler. To demonstrate the efficiency of our method, we conduct extensive experiments in various cooperative and competitive multi-agent tasks with different numbers of agents and different bandwidths. We show that IMAC converges faster and leads to efficient communication among agents under the limited bandwidth as compared to many baseline methods.
翻译:我们考虑了用于多试剂强化学习的有限带宽通信问题,即代理商在通信协议和调度器的协助下进行合作。协议和调度器共同决定了哪个代理商在向谁传递什么信息。在有限的带宽限制下,需要有一个通信协议来生成信息信息。与此同时,不应建立不必要的通信连接,因为它占用了有限的资源而徒劳无功。在本文件中,我们开发了一个信息多代理通信(IMAC)方法,以学习高效的通信协议和时间安排。首先,从通信理论的角度来看,我们证明有限的带宽限制要求在整个传输过程中传递低渗透性信息。然后,在信息瓶颈原则的启发下,我们学习了宝贵和紧凑的通信协议和基于重量的调度器。为了展示我们的方法的效率,我们用不同数量的代理商和不同带宽度的各种合作和竞争性多代理任务进行了广泛的实验。我们表明,与许多基线方法相比,IMAC公司在有限的带宽度下更快地集中并导致代理商之间高效的通信。