有效的交流:在噪音频道上开展多机构强化学习的联合学习和交流框架 (Effective Communications: A Joint Learning and Communication Framework for Multi-Agent Reinforcement Learning over Noisy Channels)

We propose a novel formulation of the "effectiveness problem" in communications, put forth by Shannon and Weaver in their seminal work [2], by considering multiple agents communicating over a noisy channel in order to achieve better coordination and cooperation in a multi-agent reinforcement learning (MARL) framework. Specifically, we consider a multi-agent partially observable Markov decision process (MA-POMDP), in which the agents, in addition to interacting with the environment can also communicate with each other over a noisy communication channel. The noisy communication channel is considered explicitly as part of the dynamics of the environment and the message each agent sends is part of the action that the agent can take. As a result, the agents learn not only to collaborate with each other but also to communicate "effectively" over a noisy channel. This framework generalizes both the traditional communication problem, where the main goal is to convey a message reliably over a noisy channel, and the "learning to communicate" framework that has received recent attention in the MARL literature, where the underlying communication channels are assumed to be error-free. We show via examples that the joint policy learned using the proposed framework is superior to that where the communication is considered separately from the underlying MA-POMDP. This is a very powerful framework, which has many real world applications, from autonomous vehicle planning to drone swarm control, and opens up the rich toolbox of deep reinforcement learning for the design of multi-user communication systems.

翻译：香农和韦弗在其开创性工作[2]中提出了通信中“效率问题”的新提法,由香农和韦弗在其开创性工作中提出[2],通过考虑在噪音的频道上进行多代理人沟通,以便在多剂强化学习(MARL)框架内实现更好的协调与合作。具体地说,我们考虑采用多剂部分可见的马尔科夫决策程序(MA-POMDP),在这种程序中,除了与环境互动外,代理人还可以通过一个吵闹的通信渠道相互沟通。噪音的通信渠道被明确视为环境动态的一部分,而每个代理人发送的信息是代理人可以采取的行动的一部分。因此,代理人不仅学会彼此协作,而且还学会在噪音的频道上“有效地”进行沟通。这个框架概括了传统的通信问题,主要目标是可靠地传递信息,而不是在噪音的频道上传递信息,以及“学习沟通”框架在MARL文献中被假定基本通信渠道是无误的。我们通过实例表明,使用拟议的框架所学到的联合政策优于真正的通信框架,而这种强大的通信应用则是从磁箱中单独地从磁箱中学习,而自主地学会了多式的系统。