To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io
翻译:要建立能够在许多环境中运作的普通机器人代理物,机器人往往必须收集真实世界的经验。然而,由于安全、时间和硬件的限制,这往往不可行。因此,我们提议利用下一个最佳事物作为真实世界的经验:人类用手的互联网视频。视觉特征等视觉前科常常从视频中学习,但我们认为,从视频中获取的更多信息可以作为一个更强的先导。我们建立了一个学习算法,即视频Dex,利用人类视频数据集的视觉、动作和物理前导来指导机器人的行为。神经网络中的这些前导力和物理前导力决定了人类典型的机器人行为。我们在机器人臂上测试了我们的方法,在各种操作任务上展示了强大的效果,超过了各种艺术水平的方法。在 https://Vicote-dex.githubio上,视频在 https://view-dex.githubio上显示。