We build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. Human hands and robot hands differ in shape, size, and joint structure, and performing this translation from a single uncalibrated camera is a highly underconstrained problem. Moreover, the retargeted trajectories must effectively execute tasks on a physical robot, which requires them to be temporally smooth and free of self-collisions. Our key insight is that while paired human-robot correspondence data is expensive to collect, the internet contains a massive corpus of rich and diverse human hand videos. We leverage this data to train a system that understands human hands and retargets a human video stream into a robot hand-arm trajectory that is smooth, swift, safe, and semantically similar to the guiding demonstration. We demonstrate that it enables previously untrained people to teleoperate a robot on various dexterous manipulation tasks. Our low-cost, glove-free, marker-free remote teleoperation system makes robot teaching more accessible and we hope that it can aid robots in learning to act autonomously in the real world. Videos at https://robotic-telekinesis.github.io/
翻译:我们建立了一个使任何人能够控制机器人手和手臂的系统,只需用自己的手演示动作即可。 机器人通过一个单一的 RGB 相机观察人体操作员, 并实时模仿他们的行为。 人类手和机器人手在形状、大小和联合结构上各不相同, 从一个未经校准的相机中进行这种翻译是一个高度不足的问题。 此外, 重新定向的轨迹必须有效地对一个物理机器人执行任务, 这要求他们暂时地顺畅无阻, 并且没有自圆其说。 我们的关键洞察力是, 当配对的人类机器人通信数据收集费用昂贵时, 互联网包含大量丰富和多样化的人类手动视频。 我们利用这些数据来训练一个能够理解人类手的系统, 并将人类的视频流重新瞄准到一个机械式的手臂轨迹上, 这是很顺畅、快速、安全、 和 语义上与指导演示相类似的。 我们证明它让以前没有受过训练的人能够将机器人远程操纵任务远程操作。 我们的低成本、 手套、 无标记的远程操作系统可以收集大量丰富的人类手动视频。 我们利用这个系统来训练机器人在现实的机器人学习。 我们希望机器人能学习。