In artificial multi-agent systems, the ability to learn collaborative policies is predicated upon the agents' communication skills: they must be able to encode the information received from the environment and learn how to share it with other agents as required by the task at hand. We present a deep reinforcement learning approach, Connectivity Driven Communication (CDC), that facilitates the emergence of multi-agent collaborative behaviour only through experience. The agents are modelled as nodes of a weighted graph whose state-dependent edges encode pair-wise messages that can be exchanged. We introduce a graph-dependent attention mechanisms that controls how the agents' incoming messages are weighted. This mechanism takes into full account the current state of the system as represented by the graph, and builds upon a diffusion process that captures how the information flows on the graph. The graph topology is not assumed to be known a priori, but depends dynamically on the agents' observations, and is learnt concurrently with the attention mechanism and policy in an end-to-end fashion. Our empirical results show that CDC is able to learn effective collaborative policies and can over-perform competing learning algorithms on cooperative navigation tasks.
翻译:在人工多试剂系统中,学习协作政策的能力取决于代理人的沟通技能:他们必须能够按照手头的任务要求,将从环境中收到的信息编码起来,并学习如何与其他代理人分享这些信息。我们提出了一个深层强化学习方法,即连接驱动器通信(CDC),该方法只能通过经验促进多代理人合作行为的出现。这些代理人的模型是一个加权图的节点,其国家依赖的边缘能编码可以交换的双向电文。我们引入了一个依赖图形的注意机制,控制代理人收到的信息是如何加权的。这个机制充分考虑到了图表所代表的系统的现状,并基于一个能够捕捉图上信息流动方式的传播过程。图形表层学不被认为是先入为主,而是动态地依赖于代理人的观察,并且以终端到终端的方式与关注机制和政策同时学习。我们的经验显示,CDC能够学习有效的协作政策,并且能够超越合作导航任务上相互竞争的学习算法。