Meta 自动课程学习 (Meta Automatic Curriculum Learning)

A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task spaces, it is essential to rely on some form of Automatic Curriculum Learning (ACL) to adapt the task sampling distribution to a given learning agent, instead of randomly sampling tasks, as many could end up being either trivial or unfeasible. Since it is hard to get prior knowledge on such task spaces, many ACL algorithms explore the task space to detect progress niches over time, a costly tabula-rasa process that needs to be performed for each new learning agents, although they might have similarities in their capabilities profiles. To address this limitation, we introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners, i.e. algorithms seeking to generalize curriculum generation to an (unknown) distribution of learners. In this work, we present AGAIN, a first instantiation of Meta-ACL, and showcase its benefits for curriculum generation over classical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies. Videos and code are available at https://sites.google.com/view/meta-acl .

翻译：深RL(DRL)社区的一项重大挑战是培训能够对从未在培训中看到的情况推广其控制政策的代理人员,培训各种任务的培训被确定为良好概括的一个关键要素,这促使研究人员使用复杂的连续参数空间所控制的丰富的程序任务生成系统。在这种复杂的任务空间中,必须依靠某种形式的自动课程学习(ACL)来调整任务抽样分配,使其适应特定的学习代理,而不是随机抽样任务,因为许多人最终可能会成为微不足道或不可行的抽样任务,因为许多人可能最终成为微不足道或不可行的抽样任务。由于很难事先获得关于此类任务空间的知识,许多ACL算法探索了任务空间,以探测一段时间内的进展专长,这是一个昂贵的 tabula-rasa进程,需要为每个新的学习代理者实施,尽管他们的能力特征可能相似。为了应对这一限制,我们引入Meta-ACL的概念,并在黑箱RLL学习者的背景下正式确定任务分布,即试图将课程生成的算法普遍化为(不为人所知的)学员的分布。在这项工作中,我们向AGRIAINA、首次展示其模拟/CLAC的模型环境,包括模拟的多式的模拟/CLADLA。