In this work we propose a learning approach to high-precision robotic assembly problems in the continuous action domain. Unlike many learning-based approaches that heavily rely on vision or spatial tracking, our approach takes force/torque as the only observation. Each learned policy from our approach is robot-agnostic, which can be applied to different robotic arms. These two features can greatly reduce complexity and cost to perform robotic assembly in the real world, especially in unstructured settings such as in architectural construction. To achieve it, we have developed a new distributed RL agent, named Recurrent Distributed DDPG (RD2), which extends Ape-X DDPG with recurrency and makes two structural improvements on prioritized experience replay. Our results show that RD2 is able to solve two fundamental high-precision assembly tasks, lap-joint and peg-in-hole, and outperforms two state-of-the-art algorithms, Ape-X DDPG and PPO with LSTM. We have successfully evaluated our robot-agnostic policies on three robotic arms, Kuka KR60, Franka Panda, and UR10, in simulation. The video presenting our experiments is available at https://sites.google.com/view/rd2-rl
翻译:在这项工作中,我们建议对连续行动领域的高精度机器人组装问题采取学习方法。与许多严重依赖视觉或空间跟踪的基于学习的方法不同,我们的方法将强/托克作为唯一的观察。从我们的方法中学习的每个政策都是机器人-神学的,可以应用于不同的机器人臂体。这两个特征可以大大降低在现实世界中进行机器人组装的复杂性和成本,特别是在建筑建筑等非结构化环境中进行机器人组装的复杂程度和成本。为了实现这一点,我们开发了一个新的分布式RL代理,名为DDDPG(RD2),它以重新货币扩展了Ape-X DDPG(RD2),对优先经验的重现进行了两个结构性改进。我们的结果显示RD2能够解决两种基本的高精度组装任务,即大腿组合和嵌入洞,并且超越了两个最先进的算法算法,即Ape-X DDPG和PPPPO与LSTM。我们成功地评估了我们关于三种机器人武器的机器人-Amb-A60、Frankagle Panda Panda和AM10s的视频/Ur的实验。