Generating agents that can achieve Zero-Shot Coordination (ZSC) with unseen partners is a new challenge in cooperative Multi-Agent Reinforcement Learning (MARL). Recently, some studies have made progress in ZSC by exposing the agents to diverse partners during the training process. They usually involve self-play when training the partners, implicitly assuming that the tasks are homogeneous. However, many real-world tasks are heterogeneous, and hence previous methods may fail. In this paper, we study the heterogeneous ZSC problem for the first time and propose a general method based on coevolution, which coevolves two populations of agents and partners through three sub-processes: pairing, updating and selection. Experimental results on a collaborative cooking task show the necessity of considering the heterogeneous setting and illustrate that our proposed method is a promising solution for heterogeneous cooperative MARL.
翻译:能够与无形伙伴实现零热协调(ZSC)的代理机构是合作性多机构强化学习(MARL)的新挑战。最近,一些研究通过在培训过程中向不同伙伴披露代理人,在ZSC取得了进步。这些研究通常在培训伙伴时涉及自我玩耍,暗中假定任务是同质的。然而,许多现实世界的任务是多种多样的,因此,以往的方法可能失败。在本文件中,我们首次研究差异性ZSC问题,并提出了以共进为基础的一般方法,通过三个分进程(配对、更新和选择)使两个代理人和伙伴群体共同发展。合作烹饪任务的实验结果表明,有必要考虑多样性环境,并表明我们提出的方法对于多种合作性MARL是一个很有希望的解决办法。