We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations and use these skills to synthesize prolonged robot behaviors. Our method starts with constructing a hierarchical task structure from each demonstration through agglomerative clustering. From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning. Finally, we train a meta controller to compose these skills to solve long-horizon manipulation tasks. The entire model can be trained on a small set of human demonstrations collected within 30 minutes without further annotations, making it amendable to real-world deployment. We systematically evaluated our method in simulation environments and on a real robot. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks. Furthermore, skills discovered from multi-task demonstrations boost the average task success by $8\%$ compared to those discovered from individual tasks.
翻译:我们通过技能发现来应对现实世界长视机器人操纵任务。 我们展示了一种自下而上的方法来学习来自未分解演示的可再利用技能库, 并使用这些技能来合成长期机器人行为。 我们的方法是从每次演示建立等级任务结构开始, 通过聚集群来构建一个等级任务结构。 我们从多重任务演示的任务结构中, 根据反复模式来识别技能, 并训练有目标条件的感官模范政策, 并进行等级类比学习。 最后, 我们训练了一位元控制器, 来组合这些技能, 以解决长视线操作任务。 整个模型可以在30分钟内收集的少量人类演示中接受培训, 无需进一步说明, 使其可以修正为现实世界的部署。 我们在模拟环境中和在真正的机器人上系统地评估了我们的方法。 我们的方法显示, 在多阶段操作任务中, 相对于最先进的模拟学习方法, 我们的方法表现优于最先进的模仿方法。 此外, 从多任务演示中发现的技能提高了平均任务成功率, 比从单个任务中发现的方法增加8 $ 。