Training general agents to follow complex instructions (tasks) in intricate environments (levels) remains a core challenge in reinforcement learning. Random sampling of task-level pairs often produces unsolvable combinations, highlighting the need to co-design tasks and levels. While unsupervised environment design (UED) has proven effective at automatically designing level curricula, prior work has only considered a fixed task. We present ATLAS (Aligning Tasks and Levels for Autocurricula of Specifications), a novel method that generates joint autocurricula over tasks and levels. Our approach builds upon UED to automatically produce solvable yet challenging task-level pairs for policy training. To evaluate ATLAS and drive progress in the field, we introduce an evaluation suite that models tasks as reward machines in Minigrid levels. Experiments demonstrate that ATLAS vastly outperforms random sampling approaches, particularly when sampling solvable pairs is unlikely. We further show that mutations leveraging the structure of both tasks and levels accelerate convergence to performant policies.
翻译:训练通用智能体在复杂环境(层级)中遵循复杂指令(任务)仍然是强化学习领域的核心挑战。对任务-层级对的随机采样常产生不可解的组合,这凸显了协同设计任务与层级的必要性。尽管无监督环境设计(UED)已被证明在自动设计层级课程方面行之有效,但先前工作仅考虑了固定任务。我们提出了ATLAS(面向规范自动课程的**任务与层级对齐**方法),这是一种在任务与层级上生成联合自动课程的新方法。我们的方法基于UED,自动为策略训练生成可解且具有挑战性的任务-层级对。为了评估ATLAS并推动该领域进展,我们引入了一个评估套件,该套件在Minigrid环境中将任务建模为奖励机。实验表明,ATLAS显著优于随机采样方法,尤其是在采样可解对概率较低的情况下。我们进一步证明,利用任务与层级结构的变异能加速策略收敛至高性能水平。