While imitation learning provides us with an efficient toolkit to train robots, learning skills that are robust to environment variations remains a significant challenge. Current approaches address this challenge by relying either on large amounts of demonstrations that span environment variations or on handcrafted reward functions that require state estimates. Both directions are not scalable to fast imitation. In this work, we present Fast Imitation of Skills from Humans (FISH), a new imitation learning approach that can learn robust visual skills with less than a minute of human demonstrations. Given a weak base-policy trained by offline imitation of demonstrations, FISH computes rewards that correspond to the "match" between the robot's behavior and the demonstrations. These rewards are then used to adaptively update a residual policy that adds on to the base-policy. Across all tasks, FISH requires at most twenty minutes of interactive learning to imitate demonstrations on object configurations that were not seen in the demonstrations. Importantly, FISH is constructed to be versatile, which allows it to be used across robot morphologies (e.g. xArm, Allegro, Stretch) and camera configurations (e.g. third-person, eye-in-hand). Our experimental evaluations on 9 different tasks show that FISH achieves an average success rate of 93%, which is around 3.8x higher than prior state-of-the-art methods.
翻译:虽然模仿学习为我们提供了培训机器人的有效工具箱,但学习能适应环境差异的学习技能仍然是一项重大挑战。目前的方法通过依赖大量跨越环境变化的演示或需要国家估计的手工制作奖励功能来应对这一挑战。两种方向都无法伸缩到快速模仿。在这项工作中,我们展示了人类技能快速模仿(FISH),这是一种能够学习强健视觉技能的新型模仿方法,其人类演示时间少于一分钟。鉴于通过离线模拟演示而培训的薄弱的基政策,FISH计算与机器人行为和演示之间的“匹配”对应的奖赏。然后这些奖赏被用于适应性地更新一项补充基本政策的剩余政策。在所有任务中,FISHM要求最多用20分钟的互动学习来模拟在演示中未见的物体配置上的演示。重要的是,FISH的构建是适应性,它能够用于机器人的离线模拟模式(例如xArm、Allegro、Streach)和摄影机组的“匹配”的“匹配”奖赏。这些奖赏被用来更新一个比我们的平均比例高的第93次的“图像”的图像——显示率高。显示我们的平均比例。</s>