Imitation learning is an effective tool for robotic learning tasks where specifying a reinforcement learning (RL) reward is not feasible or where the exploration problem is particularly difficult. Imitation, typically behavior cloning or inverse RL, derive a policy from a collection of first-person action-state trajectories. This is contrary to how humans and other animals imitate: we observe a behavior, even from other species, understand its perceived effect on the state of the environment, and figure out what actions our body can perform to reach a similar outcome. In this work, we explore the possibility of third-person visual imitation of manipulation trajectories, only from vision and without access to actions, demonstrated by embodiments different to the ones of our imitating agent. Specifically, we investigate what would be an appropriate representation method with which an RL agent can visually track trajectories of complex manipulation behavior -- non-planar with multiple-object interactions -- demonstrated by experts with different embodiments. We present a way to train manipulator-independent representations (MIR) that primarily focus on the change in the environment and have all the characteristics that make them suitable for cross-embodiment visual imitation with RL: cross-domain alignment, temporal smoothness, and being actionable. We show that with our proposed method our agents are able to imitate, with complex robot control, trajectories from a variety of embodiments and with significant visual and dynamics differences, e.g. simulation-to-reality gap.
翻译:在指定强化学习(RL)奖励不可行或勘探问题特别困难的情况下,光学学习是机器人学习任务的一个有效工具,因为指定强化学习(RL)奖励是不可行的,或者在探索问题特别困难的地方。 光学,通常是行为克隆或反转RL, 其政策来自于一流的动作-状态轨迹。 这与人类和其他动物模仿的方式相反: 我们观察一种行为, 甚至是其他物种的行为, 了解其对环境状况的认知影响, 并找出我们身体可以采取什么行动来达到类似的差异。 在这项工作中, 我们探索第三人视觉模仿操纵轨迹的可能性, 仅来自视觉, 并且没有行动机会, 由不同于我们模仿剂的化物来展示。 具体地说, 我们调查什么是适当的代表方法, 使一个机器人可以视觉跟踪复杂的操控行为的轨迹, 由具有不同化效果的专家来展示。 我们提出一种方法来训练操纵者依赖视觉的演化的演化(MIR),主要侧重于环境的变化, 并且具有各种特性, 我们的视觉感官能够展示, 和感光学运动运动运动的动作, 能够展示。 我们的模和感化的动作能够展示, 我们的动作的演化, 我们的演化, 我们的动作和视觉动作的演化方法, 能够展示, 与感变动, 与视觉动作的演化, 展示, 展示, 与感变的动作的动作与视觉动作与视觉动作的动作的动作与视觉动作与视觉运动的精调。