A seamless integration of robots into human environments requires robots to learn how to use existing human tools. Current approaches for learning tool manipulation skills mostly rely on expert demonstrations provided in the target robot environment, for example, by manually guiding the robot manipulator or by teleoperation. In this work, we introduce an automated approach that replaces an expert demonstration with a Youtube video for learning a tool manipulation strategy. The main contributions are twofold. First, we design an alignment procedure that aligns the simulated environment with the real-world scene observed in the video. This is formulated as an optimization problem that finds a spatial alignment of the tool trajectory to maximize the sparse goal reward given by the environment. Second, we describe an imitation learning approach that focuses on the trajectory of the tool rather than the motion of the human. For this we combine reinforcement learning with an optimization procedure to find a control policy and the placement of the robot based on the tool motion in the aligned environment. We demonstrate the proposed approach on spade, scythe and hammer tools in simulation, and show the effectiveness of the trained policy for the spade on a real Franka Emika Panda robot demonstration.
翻译:将机器人无缝地融入人类环境需要机器人学会如何使用现有人类工具。 学习工具操纵技能的现有方法主要依靠在目标机器人环境中提供的专家演示, 例如手动指导机器人操纵器或远程操作。 在这项工作中, 我们引入了自动方法, 用Youtube视频取代专家演示, 用Youtube视频学习工具操纵策略。 主要贡献是双重的。 首先, 我们设计了一个匹配程序, 将模拟环境与在视频中观察到的真实世界场景相匹配。 这是作为一个优化问题, 发现工具轨迹的空间对齐, 以最大限度地增加环境提供的稀薄目标奖励。 第二, 我们描述一种模仿学习方法, 侧重于工具的轨迹, 而不是人类运动。 为此, 我们将强化学习与优化程序相结合, 以找到控制政策, 并将机器人置于工具运动中, 在匹配环境中的位置上放置。 我们演示了在模拟中 SPAde、 scythe 和 锤子工具上的拟议方法 。 并展示了在真实的 Franka Emika Panda 机器人演示中经过培训的政策的有效性 。