Learning fine-grained movements is among the most challenging topics in robotics. This holds true especially for robotic hands. Robotic sign language acquisition or, more specifically, fingerspelling sign language acquisition in robots can be considered a specific instance of such challenge. In this paper, we propose an approach for learning dexterous motor imitation from videos examples, without the use of any additional information. We build an URDF model of a robotic hand with a single actuator for each joint. By leveraging pre-trained deep vision models, we extract the 3D pose of the hand from RGB videos. Then, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimisation), we train a policy to reproduce the movement extracted from the demonstrations. We identify the best set of hyperparameters to perform imitation based on a reference motion. Additionally, we demonstrate the ability of our approach to generalise over 6 different fingerspelled letters.
翻译:学习精细运动是机器人中最具挑战性的主题之一。 这对机器人手来说尤其如此。 机器人手语的获取或更具体地说,在机器人中拼拼手语的获取可被视为此类挑战的具体实例。 在本文中,我们提出一种方法,从视频实例中学习极具活力的模拟运动,而不用使用任何其他信息。 我们为每个组合建立一个带有单一动画器的机器人手的URDF模型。 通过利用预先训练的深视模型,我们从 RGB 视频中提取手的 3D 外形。 然后,我们利用最先进的强化学习算法来模拟动作( 即准政策优化), 我们训练一项政策来复制从演示中提取的运动。 我们根据参考动作确定最佳的超参数来进行仿真。 此外,我们展示了我们对6个不同手指拼写字母的概括方法的能力。