Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over either environment parameters (in single-agent settings) or co-player policies (in multi-agent settings). However, the strengths and weaknesses of co-players can manifest themselves differently depending on environmental features. It is thus crucial to consider the dependency between the environment and co-player when shaping a curriculum in multi-agent domains. In this work, we use this insight and extend Unsupervised Environment Design (UED) to multi-agent environments. We then introduce Multi-Agent Environment Design Strategist for Open-Ended Learning (MAESTRO), the first multi-agent UED approach for two-player zero-sum settings. MAESTRO efficiently produces adversarial, joint curricula over both environments and co-players and attains minimax-regret guarantees at Nash equilibrium. Our experiments show that MAESTRO outperforms a number of strong baselines on competitive two-player games, spanning discrete and continuous control settings.
翻译:自动产生具有日益挑战性的任务的课程的不限名额学习方法,是通向普遍有能力的强化学习剂的一个有希望的渠道。现有方法使课程独立地根据环境参数(单一试剂设置)或共同玩家政策(多试剂设置)调整课程。然而,共同玩家的长处和短处可视环境特点而不同。因此,在多试剂领域制定课程时,必须考虑到环境和共同玩家之间的依赖性。在这项工作中,我们利用这种洞察力,将不受监督的环境设计(UED)扩展到多试剂环境。然后我们引入多试环境设计设计促进开放式学习(MAESTRO),这是针对两玩家零和零和设置的第一种多试剂UED方法。MAESTRO高效地制作了环境与共同玩家之间的对抗性联合课程,并在纳什平衡中获得了微负负重力的保证。我们的实验显示,MAESTRO在竞争性的两玩家游戏、跨越离心和连续控制环境上,超越了一些强大的基线。</s>