When cast into the Deep Reinforcement Learning framework, many robotics tasks require solving a long horizon and sparse reward problem, where learning algorithms struggle. In such context, Imitation Learning (IL) can be a powerful approach to bootstrap the learning process. However, most IL methods require several expert demonstrations which can be prohibitively difficult to acquire. Only a handful of IL algorithms have shown efficiency in the context of an extreme low expert data regime where a single expert demonstration is available. In this paper, we present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. Based on a sequential inductive bias, our method divides the complex task into smaller skills. The skills are learned into a goal-conditioned policy that is able to solve each skill individually and chain skills to solve the entire task. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
翻译:当被应用于深度强化学习领域的时候,许多机器人任务需要解决长期和稀疏奖励问题,让学习算法难以应对。在这种情况下,模仿学习(IL)可以成为启动学习过程的强大方法。然而,大多数IL方法需要多个专家演示,这可能非常难以获得。只有少数IL算法在极少的专家数据情况下表现出效率,只有一次专家演示。在本文中,我们提出了一种新的算法,旨在从专家轨迹的状态中模仿复杂的机器人任务。基于顺序归纳偏差,我们的方法将复杂的任务分成更小的技能。这些技能被学习成为一个目标条件政策,能够单独解决每个技能并链接技能以解决整个任务。我们表明,我们的方法可以模仿非完整导航任务,并能在高样本效率的情况下扩展到复杂的虚拟机器人操作任务。