Over these years, multi-agent reinforcement learning has achieved remarkable performance in multi-agent planning and scheduling tasks. It typically follows the self-play setting, where agents are trained by playing with a fixed group of agents. However, in the face of zero-shot coordination, where an agent must coordinate with unseen partners, self-play agents may fail. Several methods have been proposed to handle this problem, but they either take a lot of time or lack generalizability. In this paper, we firstly reveal an important phenomenon: the zero-shot coordination performance is strongly linearly correlated with the similarity between an agent's training partner and testing partner. Inspired by it, we put forward a Similarity-Based Robust Training (SBRT) scheme that improves agents' zero-shot coordination performance by disturbing their partners' actions during training according to a pre-defined policy similarity value. To validate its effectiveness, we apply our scheme to three multi-agent reinforcement learning frameworks and achieve better performance compared with previous methods.
翻译:这些年来,多试剂强化学习在多试剂规划和时间安排任务方面取得了显著成绩,通常遵循自我游戏环境,即代理人通过玩弄固定的代理人来接受培训。然而,在零点协调的情况下,代理人必须与看不见的合作伙伴协调,自我游戏代理人可能会失败。提出了几种方法来处理这一问题,但它们要么花费很多时间,要么缺乏普遍性。在本文件中,我们首先揭示了一个重要现象:零点协调业绩与代理人的培训伙伴和试验伙伴之间的相似性有着很强的线性关系。我们为此提出了一个类似性强训练计划,根据预先确定的政策相似性价值,在培训期间干扰其合作伙伴的行动,从而改进代理人的零点协调性。为了验证其有效性,我们运用了三个多剂强化学习框架,并比以前的方法取得更好的业绩。