Almost all multi-agent reinforcement learning algorithms without communication follow the principle of centralized training with decentralized execution. During centralized training, agents can be guided by the same signals, such as the global state. During decentralized execution, however, agents lack the shared signal. Inspired by viewpoint invariance and contrastive learning, we propose consensus learning for cooperative multi-agent reinforcement learning in this paper. Although based on local observations, different agents can infer the same consensus in discrete space. During decentralized execution, we feed the inferred consensus as an explicit input to the network of agents, thereby developing their spirit of cooperation. Our proposed method can be extended to various multi-agent reinforcement learning algorithms with small model changes. Moreover, we carry out them on some fully cooperative tasks and get convincing results.
翻译:几乎所有没有交流的多试剂强化学习算法都遵循集中培训的原则,实行分散执行。在集中培训期间,代理商可以遵循相同的信号,如全球状态。但在分散执行期间,代理商缺乏共同的信号。但是,在分散执行期间,代理商缺乏共同的信号。在观点变化和对比性学习的启发下,我们建议在本文件中就合作性多试剂强化学习达成共识学习。虽然根据当地观察,不同的代理商可以在离散空间中推断同样的共识。在分散执行期间,我们将推断的共识作为明确的投入提供给代理商网络,从而培养他们的合作精神。我们提出的方法可以推广到各种多试剂强化学习算法中,同时进行小型模型变化。此外,我们执行一些全面合作的任务并获得令人信服的结果。