在实际工业插入中安全学习动态-动作反馈政策 (Safely Learning Visuo-Tactile Feedback Policies in Real For Industrial Insertion)

Industrial insertion tasks are often performed repetitively with parts that are subject to tight tolerances and prone to breakage. In this paper, we present a safe method to learn a visuo-tactile insertion policy that is robust against grasp pose variations while minimizing human inputs and collision between the robot and the environment. We achieve this by dividing the insertion task into two phases. In the first align phase, we learn a tactile-based grasp pose estimation model to align the insertion part with the receptacle. In the second insert phase, we learn a vision-based policy to guide the part into the receptacle. Using force-torque sensing, we also develop a safe self-supervised data collection pipeline that limits collision between the part and the surrounding environment. Physical experiments on the USB insertion task from the NIST Assembly Taskboard suggest that our approach can achieve 45/45 insertion successes on 45 different initial grasp poses, improving on two baselines: (1) a behavior cloning agent trained on 50 human insertion demonstrations (1/45) and (2) an online RL policy (TD3) trained in real (0/45).

翻译：在本文件中,我们提出了一种安全的方法来学习一种防触觉插入政策,这种政策在尽量减少人类投入和机器人与环境之间碰撞的同时,会造成差异,同时尽量减少机器人与环境之间的碰撞。我们通过将插入任务分为两个阶段来实现这一点。在第一个对接阶段,我们学习了一种基于触觉的掌握方法,从而形成一种估计模型,使插入部分与贮器相匹配。在第二个插入阶段,我们学习了一种基于愿景的政策,引导部分进入贮器。我们还利用强力感,开发了一种安全的自我监督的数据收集管道,限制部分与周围环境的碰撞。 NIST大会任务板对USB插入任务进行的物理实验表明,我们的方法可以在45种不同的初始抓包上实现45/45个插入成功,在两个基线上有所改进:(1) 行为克隆剂,经过50人插入演示的训练(1/45)和(2) 在线RL政策(TD 3),经过真实的训练(0/45)。