HAMMER:通过通信手段对加强学习代理进行多级协调 (HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned Messaging)

Cooperative multi-agent reinforcement learning (MARL) has achieved significant results, most notably by leveraging the representation learning abilities of deep neural networks. However, large centralized approaches quickly become infeasible as the number of agents scale, and fully decentralized approaches can miss important opportunities for information sharing and coordination. Furthermore, not all agents are equal - in some cases, individual agents may not even have the ability to send communication to other agents or explicitly model other agents. This paper considers the case where there is a single, powerful, central agent that can observe the entire observation space, and there are multiple, low powered, local agents that can only receive local observations and cannot communicate with each other. The job of the central agent is to learn what message to send to different local agents, based on the global observations, not by centrally solving the entire problem and sending action commands, but by determining what additional information an individual agent should receive so that it can make a better decision. After explaining our MARL algorithm, hammer, and where it would be most applicable, we implement it in the cooperative navigation and multi-agent walker domains. Empirical results show that 1) learned communication does indeed improve system performance, 2) results generalize to multiple numbers of agents, and 3) results generalize to different reward structures.

翻译：合作性多剂强化学习(MARL)已经取得显著成果,最显著的是利用深层神经网络的代表性学习能力。然而,大型集中化方法很快变得不可行,因为代理人规模之大,完全分散化的方法会错过信息共享和协调的重要机会。此外,并非所有代理商都平等----在某些情况下,单个代理商甚至连向其他代理商发送通信的能力都没有能力,或者没有明确的模范其他代理商。本文考虑了一个单一的、强大的中央代理商能够观测整个观测空间,并且有多个、低功率的、地方代理商只能接收当地观测,无法相互沟通。中央代理商的任务是根据全球观测了解向不同地方代理商发送的信息,而不是通过集中解决整个问题和发送行动指令,而是通过确定单个代理商应当收到哪些额外信息以便做出更好的决定。在解释我们的MARL算法、锤法和最适用的情况下,我们在合作性导航和多剂行走域实施该方法。Empricalalal结果表明,1)学习通信确实改进系统绩效结构,2)结果,以及多种数字。