Most successes in robotic manipulation have been restricted to single-arm gripper robots, whose low dexterity limits the range of solvable tasks to pick-and-place, inser-tion, and object rearrangement. More complex tasks such as assembly require dual and multi-arm platforms, but entail a suite of unique challenges such as bi-arm coordination and collision avoidance, robust grasping, and long-horizon planning. In this work we investigate the feasibility of training deep reinforcement learning (RL) policies in simulation and transferring them to the real world (Sim2Real) as a generic methodology for obtaining performant controllers for real-world bi-manual robotic manipulation tasks. As a testbed for bi-manual manipulation, we develop the U-Shape Magnetic BlockAssembly Task, wherein two robots with parallel grippers must connect 3 magnetic blocks to form a U-shape. Without manually-designed controller nor human demonstrations, we demonstrate that with careful Sim2Real considerations, our policies trained with RL in simulation enable two xArm6 robots to solve the U-shape assembly task with a success rate of above90% in simulation, and 50% on real hardware without any additional real-world fine-tuning. Through careful ablations,we highlight how each component of the system is critical for such simple and successful policy learning and transfer,including task specification, learning algorithm, direct joint-space control, behavior constraints, perception and actuation noises, action delays and action interpolation. Our results present a significant step forward for bi-arm capability on real hardware, and we hope our system can inspire future research on deep RL and Sim2Real transfer of bi-manualpolicies, drastically scaling up the capability of real-world robot manipulators.
翻译:大多数机器人操作成功只局限于单臂抓取机器人,其低敏捷性限制了可解决的任务范围,仅限于取放、插入和物体重新排列。更复杂的任务如组装需要双臂和多臂平台,但涉及一系列独特的挑战,如双臂协调和避免碰撞、坚固的抓握以及长程规划。在这项工作中,我们研究了在模拟中训练深度强化学习(RL)策略并将其转移到现实世界中(Sim2Real)作为一种获得真实世界双手机器人操纵任务的性能控制器的通用方法。作为双手操作的测试床,我们开发了U形磁块装配任务,其中具有平行夹持器的两个机器人必须连接3个磁块以形成U形。在没有手动设计的控制器或人类演示的情况下,我们证明了在模拟中使用RL训练的策略使两个xArm6机器人能够解决U形组装任务,成功率在模拟中高达90%,在真实硬件上为50%,无需进行额外的现实世界微调。通过仔细的消融,我们强调整个系统的每个组件对于这种简单而成功的策略学习和转移至关重要,包括任务规范、学习算法、直接关节空间控制、行为约束、感知和执行噪声、行动延迟和行动插值。我们的结果为真实硬件上的双臂能力迈出了一大步,我们希望我们的系统能够激发未来RL和Sim2Real传输的双手策略的研究,从而大规模扩展真实世界的机器人操纵器的能力。