Learning fine-grained movements is a challenging topic in robotics, particularly in the context of robotic hands. One specific instance of this challenge is the acquisition of fingerspelling sign language in robots. In this paper, we propose an approach for learning dexterous motor imitation from video examples without additional information. To achieve this, we first build a URDF model of a robotic hand with a single actuator for each joint. We then leverage pre-trained deep vision models to extract the 3D pose of the hand from RGB videos. Next, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimization and soft actor-critic), we train a policy to reproduce the movement extracted from the demonstrations. We identify the optimal set of hyperparameters for imitation based on a reference motion. Finally, we demonstrate the generalizability of our approach by testing it on six different tasks, corresponding to fingerspelled letters. Our results show that our approach is able to successfully imitate these fine-grained movements without additional information, highlighting its potential for real-world applications in robotics.
翻译:学习细粒度动作对于机器人而言是一个具有挑战性的主题,特别是在机器人手部领域。其中一个特定的挑战是在机器人中获取手语拼写。在本文中,我们提出了一种方法,用于从视频示例中学习无需额外信息的巧妙运动模仿。为了实现这一点,我们首先建立了一个带有每个关节单独执行器的机器人手的 URDF 模型。然后,利用预训练的深度视觉模型从 RGB 视频中提取手的 3D 姿态。接下来,使用最先进的运动模仿强化学习算法(即近端策略优化和软演员批评家),我们训练了一种策略,以重现从演示中提取的运动。我们基于参考运动识别了最优的模仿超参数集。最后,我们通过测试六个不同的任务(对应于手语拼写的字母),展示了我们方法的普适性。我们的结果表明,我们的方法能够在没有附加信息的情况下成功地模仿这些细粒度运动,强调了它在机器人领域中实际应用的潜力。