In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks, a long-standing challenge in robot learning. We approach the challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate such generalization. To that end, we develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pre-trained embeddings of natural language or videos of humans performing the task. When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%, without any robot demonstrations for those tasks.
翻译:在本文中,我们研究了使基于视觉的机器人操纵系统能够推广到新任务的问题,这是机器人学习方面的长期挑战。我们从模仿学习的角度来应对挑战,目的是研究如何扩大和扩展所收集的数据能够促进这种概括化。为此,我们开发了一个互动和灵活的模拟学习系统,既可以从演示和干预中学习,又可以以传递任务的不同形式信息为条件,包括预先培训的自然语言嵌入或执行任务的人类的视频。当将真实机器人的数据收集扩大到100多项不同的任务时,我们发现这个系统可以执行24项无形操作任务,平均成功率为44%,而没有机器人为这些任务做任何演示。