We propose Automatic Curricula via Expert Demonstrations (ACED), a reinforcement learning (RL) approach that combines the ideas of imitation learning and curriculum learning in order to solve challenging robotic manipulation tasks with sparse reward functions. Curriculum learning solves complicated RL tasks by introducing a sequence of auxiliary tasks with increasing difficulty, yet how to automatically design effective and generalizable curricula remains a challenging research problem. ACED extracts curricula from a small amount of expert demonstration trajectories by dividing demonstrations into sections and initializing training episodes to states sampled from different sections of demonstrations. Through moving the reset states from the end to the beginning of demonstrations as the learning agent improves its performance, ACED not only learns challenging manipulation tasks with unseen initializations and goals, but also discovers novel solutions that are distinct from the demonstrations. In addition, ACED can be naturally combined with other imitation learning methods to utilize expert demonstrations in a more efficient manner, and we show that a combination of ACED with behavior cloning allows pick-and-place tasks to be learned with as few as 1 demonstration and block stacking tasks to be learned with 20 demonstrations.
翻译:我们提出专家示范自动课程(ACED),这是一种强化学习(RL)方法,将模仿学习和课程学习的理念结合起来,以便解决挑战性机器人操纵任务,同时使用微弱的奖励功能。课程学习通过引入一系列辅助任务来解决复杂的RL任务,困难越来越大,然而,如何自动设计有效和通用的课程仍是一个具有挑战性的研究问题。ACED从少数专家示范轨迹中抽取课程,将示范分为几个部分,并开始培训阶段,以便从不同示威的样本中抽取到各州。通过将重新设置的州从最后移到开始,随着学习机构改进其绩效,ACED不仅学会挑战以无形初始化和目标进行操作的任务,而且还发现与演示不同的新颖解决办法。此外,ACED可以自然地与其他模仿学习方法相结合,以便以更有效的方式利用专家演示,我们证明将ACED与行为克隆结合在一起,可以把选择和选择任务从一个演示中学习到20个演示。