学习将两军备大会集中化 (Learning to Centralize Dual-Arm Assembly)

Even though industrial manipulators are widely used in modern manufacturing processes, deployment in unstructured environments remains an open problem. To deal with variety, complexity and uncertainty of real world manipulation tasks a general framework is essential. In this work we want to focus on assembly with humanoid robots by providing a framework for dual-arm peg-in-hole manipulation. As we aim to contribute towards an approach which is not limited to dual-arm peg-in-hole, but dual-arm manipulation in general, we keep modeling effort at a minimum. While reinforcement learning has shown great results for single-arm robotic manipulation in recent years, research focusing on dual-arm manipulation is still rare. Solving such tasks often involves complex modeling of interaction between two manipulators and their coupling at a control level. In this paper, we explore the applicability of model-free reinforcement learning to dual-arm manipulation based on a modular approach with two decentralized single-arm controllers and a single centralized policy. We reduce modeling effort to a minimum by using sparse rewards only. We demonstrate the effectiveness of the framework on dual-arm peg-in-hole and analyze sample efficiency and success rates for different action spaces. Moreover, we compare results on different clearances and showcase disturbance recovery and robustness, when dealing with position uncertainties. Finally we zero-shot transfer policies trained in simulation to the real-world and evaluate their performance.

翻译：尽管工业操纵者在现代制造过程中被广泛使用,但是在非结构化环境中部署工业操纵者仍然是一个开放的问题。为了应对现实世界操纵任务的多样性、复杂性和不确定性,必须有一个总体框架。在这项工作中,我们希望通过提供双臂固定孔内操纵框架,将重点放在与人形机器人组装上。我们的目标是促进一种方法,该方法不局限于双臂固定孔,而是一般的双臂操纵。虽然强化学习显示单臂机器人操纵的伟大成果,但近年来侧重于双臂操纵的研究仍然很少。解决这类任务往往涉及两个操纵者之间互动的复杂模型及其在控制层面上的组合。在本文中,我们探索无型强化学习是否适用于基于模块化方法的双臂操纵,由两个分散的单臂控制器和单一的集中化政策。我们仅使用微薄的奖赏,将建模工作减少到最低限度。我们展示了双臂固定孔和分析样本效率和成功率的框架的有效性,我们用不同的空间进行测试,我们用经过培训的升级的升级的恢复和模拟,我们用不同的空间来比较其真实的恢复和升级。