In Multi-Agent Reinforcement Learning, communication is critical to encourage cooperation among agents. Communication in realistic wireless networks can be highly unreliable due to network conditions varying with agents' mobility, and stochasticity in the transmission process. We propose a framework to learn practical communication strategies by addressing three fundamental questions: (1) When: Agents learn the timing of communication based on not only message importance but also wireless channel conditions. (2) What: Agents augment message contents with wireless network measurements to better select the game and communication actions. (3) How: Agents use a novel neural message encoder to preserve all information from received messages, regardless of the number and order of messages. Simulating standard benchmarks under realistic wireless network settings, we show significant improvements in game performance, convergence speed and communication efficiency compared with state-of-the-art.
翻译:在多机构强化学习中,通信对于鼓励代理商之间的合作至关重要;现实的无线网络的通信由于代理商的流动性和传输过程中的随机性等不同的网络条件而可能非常不可靠;我们提出了一个框架,通过解决三个基本问题来学习实用的通信战略:(1) 当代理商不仅根据电文的重要性而且根据无线频道的条件学习通信的时间;(2) 当代理商不仅根据电文的重要性而且根据无线频道的条件学习通信的时间;(2) 当代理商通过无线网络测量来增加信息内容,以更好地选择游戏和通信行动;(3) 如何:代理商使用新颖的神经信息编码器来保存所收到的信息中的所有信息,无论电文的数量和顺序如何;在现实的无线网络环境中模拟标准基准,我们显示与最新技术相比,游戏性、趋同速度和通信效率有了显著提高。