Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent systems (MASs) is challenging because: (i) each agent has access to only limited information; (ii) issues on convergence or computational complexity emerge due to the curse of dimensionality. In this paper, we propose a general computationally efficient distributed framework for cooperative multi-agent reinforcement learning (MARL) by utilizing the structures of graphs involved in this problem. We introduce three coupling graphs describing three types of inter-agent couplings in MARL, namely, the state graph, the observation graph and the reward graph. By further considering a communication graph, we propose two distributed RL approaches based on local value-functions derived from the coupling graphs. The first approach is able to reduce sample complexity significantly under specific conditions on the aforementioned four graphs. The second approach provides an approximate solution and can be efficient even for problems with dense coupling graphs. Here there is a trade-off between minimizing the approximation error and reducing the computational complexity. Simulations show that our RL algorithms have a significantly improved scalability to large-scale MASs compared with centralized and consensus-based distributed RL algorithms.
翻译:大规模合作多试剂系统(MAS)的分布式强化学习(RL)之所以具有挑战性,是因为:(一) 每一代理商只能获得有限的信息;(二) 由于维度的诅咒,关于趋同或计算复杂性的问题出现。在本文件中,我们建议利用与这一问题有关的图表结构,为合作多试剂强化学习(MARL)提供一个普遍、计算高效的分布式框架。我们引入了三个合并图,说明MARL的三种类型的跨试剂联结,即州图、观察图和奖赏图。我们进一步考虑通信图后,建议两种基于从合并图中得出的当地价值功能的分布式RL方法。第一种方法是在上述四个图的具体条件下大幅度降低样本复杂性。第二种方法是提供一种近似的解决办法,甚至对于紧凑的联结图问题也是有效的。在这里,在尽量减少近似误差和降低计算复杂性之间存在着一种利弊端。模拟表明我们的RL算法与基于中央和协商一致的分布式的RL算法相比,对于大规模MASMAS的可有显著的可缩度。