多代理深强化学习视觉通信地图 (A Visual Communication Map for Multi-Agent Deep Reinforcement Learning)

Deep reinforcement learning has been applied successfully to solve various real-world problems and the number of its applications in the multi-agent settings has been increasing. Multi-agent learning distinctly poses significant challenges in the effort to allocate a concealed communication medium. Agents receive thorough knowledge from the medium to determine subsequent actions in a distributed nature. Apparently, the goal is to leverage the cooperation of multiple agents to achieve a designated objective efficiently. Recent studies typically combine a specialized neural network with reinforcement learning to enable communication between agents. This approach, however, limits the number of agents or necessitates the homogeneity of the system. In this paper, we have proposed a more scalable approach that not only deals with a great number of agents but also enables collaboration between dissimilar functional agents and compatibly combined with any deep reinforcement learning methods. Specifically, we create a global communication map to represent the status of each agent in the system visually. The visual map and the environmental state are fed to a shared-parameter network to train multiple agents concurrently. Finally, we select the Asynchronous Advantage Actor-Critic (A3C) algorithm to demonstrate our proposed scheme, namely Visual communication map for Multi-agent A3C (VMA3C). Simulation results show that the use of visual communication map improves the performance of A3C regarding learning speed, reward achievement, and robustness in multi-agent problems.

翻译：深度强化学习已被成功应用,以解决各种现实世界问题,其应用在多试剂环境中的数量一直在增加。多试剂学习明显在分配隐藏的通信媒介的努力中构成重大挑战。代理从媒体获得透彻的知识,以决定随后在分布式情况下采取的行动。显然,目标是利用多种代理的合作,以高效率地实现指定的目标。最近的研究通常将专门的神经网络与强化学习结合起来,以便能够在代理器之间进行交流。但是,这种方法限制代理器的数量或需要系统的同质性。在本文件中,我们提出了一个更可扩展的方法,不仅涉及大量代理商,而且还使不同功能代理商之间能够合作,并能够与任何深度强化学习方法兼容。具体地说,我们制作了全球通信地图,以直观地代表每个代理器在系统中的地位。视觉地图和环境状态被反馈到一个共同的参数网络,以便同时培训多个代理商。最后,我们选择了Asyncronous Advantor-C(A3C) 的可伸缩性算法,以展示我们拟议的图像-A-MA学习成绩的进度图。