Learning communication strategies in cooperative multi-agent reinforcement learning (MARL) has recently attracted intensive attention. Early studies typically assumed a fully-connected communication topology among agents, which induces high communication costs and may not be feasible. Some recent works have developed adaptive communication strategies to reduce communication overhead, but these methods cannot effectively obtain valuable information from agents that are beyond the communication range. In this paper, we consider a realistic communication model where each agent has a limited communication range, and the communication topology dynamically changes. To facilitate effective agent communication, we propose a novel communication protocol called Adaptively Controlled Two-Hop Communication (AC2C). After an initial local communication round, AC2C employs an adaptive two-hop communication strategy to enable long-range information exchange among agents to boost performance, which is implemented by a communication controller. This controller determines whether each agent should ask for two-hop messages and thus helps to reduce the communication overhead during distributed execution. We evaluate AC2C on three cooperative multi-agent tasks, and the experimental results show that it outperforms relevant baselines with lower communication costs.
翻译:最近的一些工作制定了适应性通信战略,以减少通信间接费用,但这些方法无法有效地从超出通信范围的代理商那里获得宝贵的信息。在本文件中,我们考虑了一种现实的通信模式,其中每个代理商的通信范围有限,而且通信地形也发生动态变化。为了便利有效的代理商通信,我们提议了一个名为“适应性控制双倍通信”的新颖通信协议。在最初一轮当地通信后,AC2C采用适应性双速通信战略,使代理商之间能够进行远程信息交流,以提高绩效,由通信控制员实施。该控制员决定是否每个代理商应当要求双声信息,从而帮助减少分配执行期间的通信间接费用。我们评估AC2C三项合作性多试任务,实验结果显示,它超出了通信成本较低的相关基线。</s>