When cast into the Deep Reinforcement Learning framework, many robotics tasks require solving a long horizon and sparse reward problem, where learning algorithms struggle. In such context, Imitation Learning (IL) can be a powerful approach to bootstrap the learning process. However, most IL methods require several expert demonstrations which can be prohibitively difficult to acquire. Only a handful of IL algorithms have shown efficiency in the context of an extreme low expert data regime where a single expert demonstration is available. In this paper, we present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. Based on a sequential inductive bias, our method divides the complex task into smaller skills. The skills are learned into a goal-conditioned policy that is able to solve each skill individually and chain skills to solve the entire task. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
翻译:当被引入深强化学习框架时,许多机器人任务需要解决一个长期和微薄的奖励问题,即学习算法挣扎的问题。在这种情况下,模拟学习(IL)可以是吸引学习过程的有力方法。然而,大多数IL方法需要数种专家演示,而这种演示可能难以获得。只有少数IL算法在极低的专家数据制度下显示了效率,在这种制度下,可以有一个单一的专家演示。在本文中,我们展示了一种新奇的算法,旨在模仿专家轨道状态的复杂机器人任务。根据一种连续的感知偏差,我们的方法将复杂的任务分成了较小的技能。这些技能被学习成一种有目标的、有条件的政策,能够解决个人和连锁技能的每一项技能,从而解决整个任务。我们显示,我们的方法模仿了一种非高基因组导航任务和规模的复杂模拟机器人操作任务,其抽样效率非常高。