Cooperative multi-agent reinforcement learning (MARL) has achieved significant results, most notably by leveraging the representation-learning abilities of deep neural networks. However, large centralized approaches quickly become infeasible as the number of agents scale, and fully decentralized approaches can miss important opportunities for information sharing and coordination. Furthermore, not all agents are equal -- in some cases, individual agents may not even have the ability to send communication to other agents or explicitly model other agents. This paper considers the case where there is a single, powerful, \emph{central agent} that can observe the entire observation space, and there are multiple, low-powered \emph{local agents} that can only receive local observations and are not able to communicate with each other. The central agent's job is to learn what message needs to be sent to different local agents based on the global observations, not by centrally solving the entire problem and sending action commands, but by determining what additional information an individual agent should receive so that it can make a better decision. In this work we present our MARL algorithm \algo, describe where it would be most applicable, and implement it in the cooperative navigation and multi-agent walker domains. Empirical results show that 1) learned communication does indeed improve system performance, 2) results generalize to heterogeneous local agents, and 3) results generalize to different reward structures.
翻译:合作性多剂强化学习(MARL)已经取得了显著成果,其中最显著的是利用深层神经网络的代表性学习能力。然而,大型集中化方法很快变得不可行,因为代理人的规模可能丧失重要的信息分享和协调机会。此外,并非所有代理人都是平等的 -- -- 在某些情况下,个别代理人可能甚至没有能力向其他代理人发送通信或明确模拟其他代理人。本文考虑了能够观测整个观测空间的单一、强大、emph{中央代理人的情况,并且有多个、低功率的当地代理人,只能接受当地观察,无法相互沟通。中央代理人的工作是了解根据全球观察需要向不同地方代理人发送的信息,而不是集中解决整个问题和发出行动指令,而是确定个别代理人应当收到哪些额外信息,以便做出更好的决定。在这项工作中,我们介绍了我们的MAL算法,描述它将最适用到哪里,并且只能在合作性导航和多导师结构中执行。 中央代理人的工作是了解哪些信息,一般的学习成绩。 1) 企业化和多质化系统的成果。