Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).
翻译:多试剂设置仍然是强化学习领域的一项根本挑战,原因是部分可观测性和各代理商之间缺乏准确的实时互动。在本文件中,我们提出了基于当地通信学习的新方法,以应对大量代理商共存的多试剂RL挑战。首先,我们设计了新的通信协议,利用深度变迁能力,有效挖掘当地关系并学习邻国代理商之间的当地沟通。为了便利多试剂协调,我们明确了解联合行动的效果,将邻接代理商的政策作为投入。第二,我们将中位近距离近似引入我们减少代理商互动规模的方法。为了更有效地协调邻运代理商的行为,我们通过监管的政策校正网络(PRN),加强人均距离近距离近距离近距离近距离近,以纠正实时代理商的相互作用和纠正近距离偏差的学习补偿术语。拟议方法有助于高效协调,并超越适应性交通信号控制任务和StarCraft II多试挑战的若干基线方法。