学习普及人类示范组织长期-Horizon任务 (Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations)

Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world manipulation tasks efficiently from a small number of human demonstrations, and 2) synthesize new behaviors not contained in the collected demonstrations. Our key insight is that multi-task domains often present a latent structure, where demonstrated trajectories for different tasks intersect at common regions of the state space. We present Generalization Through Imitation (GTI), a two-stage offline imitation learning algorithm that exploits this intersecting structure to train goal-directed policies that generalize to unseen start and goal state combinations. In the first stage of GTI, we train a stochastic policy that leverages trajectory intersections to have the capacity to compose behaviors from different demonstration trajectories together. In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations. We validate GTI in both simulated domains and a challenging long-horizon robotic manipulation domain in the real world. Additional results and videos are available at https://sites.google.com/view/gti2020/ .

翻译：光学学习是一种在现实世界中培训机器人政策的有效和安全技术,因为它并不依赖于昂贵的随机探索过程。然而,由于缺乏探索,学习超越所显示的行为的普通化政策仍然是一项公开的挑战。我们展示了一个新颖的仿真学习框架,使机器人能够从少量人类演示中有效地学习复杂的真实世界操纵任务,以及2)综合所收集的演示中未包含的新行为。我们的关键洞察力是,多任务域往往呈现一种潜在的结构,在州空间的共同区域展示不同任务交叉的轨迹。我们展示了通用的透视(GTI),这是两阶段的离线模拟学习算法,利用这种交叉结构来培训目标导向的政策,从而向看不见的开始和目标状态组合。在GTI的第一阶段,我们训练了一种利用轨迹交叉式交叉法,以便能够从不同的演示轨迹/轨迹共演化行为。在GTI的第二阶段,我们收集了一套小型的滚动式平流式平流视频,在不固定的域域域域域域程中,我们又开始了一套不固定的滚动式平板化目标。