学习将两军备大会集中化 (Learning to Centralize Dual-Arm Assembly)

Robotic manipulators are widely used in modern manufacturing processes. However, their deployment in unstructured environments remains an open problem. To deal with the variety, complexity, and uncertainty of real-world manipulation tasks, it is essential to develop a flexible framework with reduced assumptions on the environment characteristics. In recent years, reinforcement learning (RL) has shown great results for single-arm robotic manipulation. However, research focusing on dual-arm manipulation is still rare. From a classical control perspective, solving such tasks often involves complex modeling of interactions between two manipulators and the objects encountered in the tasks, as well as the two robots coupling at a control level. Instead, in this work, we explore the applicability of model-free RL to dual-arm assembly. As we aim to contribute towards an approach that is not limited to dual-arm assembly, but dual-arm manipulation in general, we keep modeling efforts at a minimum. Hence, to avoid modeling the interaction between the two robots and the used assembly tools, we present a modular approach with two decentralized single-arm controllers which are coupled using a single centralized learned policy. We reduce modeling effort to a minimum by using sparse rewards only. Our architecture enables successful assembly and simple transfer from simulation to the real world. We demonstrate the effectiveness of the framework on dual-arm peg-in-hole and analyze sample efficiency and success rates for different action spaces. Moreover, we compare results on different clearances and showcase disturbance recovery and robustness, when dealing with position uncertainties. Finally we zero-shot transfer policies trained in simulation to the real world and evaluate their performance.

翻译：机械操纵器在现代制造过程中被广泛使用。然而,在非结构化环境中部署机械操纵器仍是一个尚未解决的问题。为了应对真实世界操作任务的多样性、复杂性和不确定性,必须制定一个灵活的框架,同时减少对环境特性的假设。近年来,强化学习(RL)在单臂机器人操作方面已经取得了巨大成果。然而,侧重于双臂操纵的研究仍然很少。从传统控制角度来看,解决这类任务往往涉及两个操纵器和任务中遇到的物体之间互动的复杂模型,以及两个机器人在控制层面上的组合。相反,在这项工作中,我们探索无型RL政策对双臂装配的适用性。我们的目标是促进一种不局限于双臂组装、而是一般双臂操纵的方法。我们把双臂操纵的研究工作保持在最低限度。因此,为了避免模拟两个机器人与使用过的装配工具之间的互动,我们提出一种模块化方法,由两个分散的单一的单臂控制器进行互动,同时使用一个集中的政策。我们用经过训练的模拟的模拟努力将无型RL政策适用于双臂装组装组装装的操作率,我们只能以最低的操作率来模拟和模拟世界操作。我们用简单的模拟的模拟操作的模拟操作,我们用模拟的模拟的模拟的模拟的模拟操作,,我们用简单的模拟和模拟的升级的操作,我们用简单的模拟的模拟的操作,,我们用简单的模拟的模拟的操作, 和模拟的模拟的操作,让我们式的模拟的操作,,我们用的是模拟的操作,,让我们式的操作,让我们式的平式的平式的操作制式的操作器,让我们制式的操作制式的操作制式的操作制。我们用简单的模拟的操作制式的操作制式的操作制。