We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective. We call our method WHIRL: In-the-Wild Human Imitating Robot Learning. WHIRL extracts a prior over the intent of the human demonstrator, using it to initialize our agent's policy. We introduce an efficient real-world policy learning scheme that improves using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild. Videos and talk at https://human2robot.github.io
翻译:我们通过观察野生的人类来应对学习问题。虽然模仿和强化学习的传统方法在现实世界中有学习的希望,但它们要么是抽样低效的,要么被限制在实验室环境中。 同时,在处理被动的、非结构化的人类数据方面有很多成功之处。我们建议通过高效的一次性机器人学习算法来解决这一问题,这种算法围绕第三人的角度来学习。我们称我们的方法WHIRL为WHIRL:在野人模拟机器人学习。WHIRL在人类示范器的意图上提取了一个先版,用它来启动我们的代理器政策。我们引入了一个高效的现实世界政策学习计划,利用互动来改进。我们的主要贡献是一种简单的抽样政策优化方法,一种将人类和机器人视频与探索方法相匹配的新型目标功能,以及一种提高样本效率的方法。我们在现实世界环境中展示了一张照片的概括和成功,包括20项不同的操作任务。视频和在 https://human2robot.githubio上进行交谈。