Decentralized cooperation in partially-observable multi-agent systems requires effective communications among agents. To support this effort, this work focuses on the class of problems where global communications are available but may be unreliable, thus precluding differentiable communication learning methods. We introduce FCMNet, a reinforcement learning based approach that allows agents to simultaneously learn a) an effective multi-hop communications protocol and b) a common, decentralized policy that enables team-level decision-making. Specifically, our proposed method utilizes the hidden states of multiple directional recurrent neural networks as communication messages among agents. Using a simple multi-hop topology, we endow each agent with the ability to receive information sequentially encoded by every other agent at each time step, leading to improved global cooperation. We demonstrate FCMNet on a challenging set of StarCraft II micromanagement tasks with shared rewards, as well as a collaborative multi-agent pathfinding task with individual rewards. There, our comparison results show that FCMNet outperforms state-of-the-art communication-based reinforcement learning methods in all StarCraft II micromanagement tasks, and value decomposition methods in certain tasks. We further investigate the robustness of FCMNet under realistic communication disturbances, such as random message loss or binarized messages (i.e., non-differentiable communication channels), to showcase FMCNet's potential applicability to robotic tasks under a variety of real-world conditions.
翻译:为了支持这一努力,这项工作侧重于全球通信可提供但可能不可靠的各类问题,从而排除了不同的通信学习方法。我们引入了FCMNet,这是一种强化学习方法,使代理同时学习(a) 有效的多希望通信协议和(b) 一种共同的、分散化的政策,使团队一级的决策成为可能。具体地说,我们建议的方法利用多个方向性经常性神经网络的隐藏状态作为代理方之间的通信信息。我们利用简单的多希望型结构,使每个代理方有能力接收由其他所有代理方在每一步上依次编码的信息,从而改进全球合作。我们展示FCMNet,这是一套具有挑战性的StarCraft II微观管理任务,具有共同回报,以及合作性多代理方调查任务,使个人能够做出决策。我们的比较结果表明,FCMNet在所有Starft II微观管理任务中,超越了基于现代通信的基于状态的基于通信的强化学习方法。我们进一步调查了Scarformal-complormal compossition 方法,以及某些不现实的通信渠道。我们进一步调查了不为真实的、不透明的通信的流式信息。