Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data. However, IL methods often fail to generalize to new scene configurations not covered by training data. On the other hand, humans can manipulate objects in varying conditions. Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their movements at task-relevant objects and be invariant to the objects' absolute spatial location. In this work, we present a learnable action space, Hand-eye Action Networks (HAN), that can approximate human's hand-eye coordination behaviors by learning from human teleoperated demonstrations. Through a set of challenging multi-stage manipulation tasks, we show that a visuomotor policy equipped with HAN is able to inherit the key spatial invariance property of hand-eye coordination and achieve zero-shot generalization to new scene configurations. Additional materials available at https://sites.google.com/stanford.edu/han
翻译:模拟学习(IL)是从离线演示数据中学习相对摩托技能的有效框架。 但是, IL方法往往无法向培训数据没有覆盖的新场景配置进行概括。 另一方面, 人类可以在不同条件下操纵物体。 这种能力的关键是手眼协调,一种认知能力,使人类能够在任务相关物体上适应性地引导其运动,并且对物体的绝对空间位置不动。 在这项工作中,我们提出了一个可学习的行动空间,即手眼行动网络(HAN),它可以通过学习人类远程操作演示来近似人类手心协调行为。通过一套具有挑战性的多阶段操作任务,我们显示,配备了HAN的对面摩托政策能够继承手眼协调的关键空间变量,并实现对新场配置的零光化。其他材料可在 https://sites.google.com/stanford.edu/han查阅。