We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.
翻译:我们建议一种语义模仿方法,它使用来自源域的演示,例如人类视频,来加速不同目标域的强化学习,例如模拟厨房的机器人操纵器。我们的方法不是模仿联合速度等低层次的动作,而是模仿“打开微波”或“转向炉子”等经演示的语义技能序列。这使我们能够将演示转移到环境(例如真实世界到模拟厨房)和代理化物(例如人与机器人臂的人工演示)之间。我们评价三个挑战性的跨部学习问题,并匹配需要内部演示的演示加速式RL方法的性能。在模拟厨房环境中,我们的方法学习长期的旋律机器人操纵任务,从现实世界厨房学习不到3分钟的人类视频演示。这使我们能够通过再利用演示,例如作为人类视频收集的机器人学习,在任何几个目标领域学习。