This paper introduces a discrete-continuous action space to learn insertion primitives for robotic assembly tasks. Primitive is a sequence of elementary actions with certain exit conditions, such as "pushing down the peg until contact". Since the primitive is an abstraction of robot control commands and encodes human prior knowledge, it reduces the exploration difficulty and yields better learning efficiency. In this paper, we learn robot assembly skills via primitives. Specifically, we formulate insertion primitives as parameterized actions: hybrid actions consisting of discrete primitive types and continuous primitive parameters. Compared with the previous work using a set of discretized parameters for each primitive, the agent in our method can freely choose primitive parameters from a continuous space, which is more flexible and efficient. To learn these insertion primitives, we propose Twin-Smoothed Multi-pass Deep Q-Network (TS-MP-DQN), an advanced version of MP-DQN with twin Q-network to reduce the Q-value over-estimation. Extensive experiments are conducted in the simulation and real world for validation. From experiment results, our approach achieves higher success rates than three baselines: MP-DQN with parameterized actions, primitives with discrete parameters, and continuous velocity control. Furthermore, learned primitives are robust to sim-to-real transfer and can generalize to challenging assembly tasks such as tight round peg-hole and complex shaped electric connectors with promising success rates. Experiment videos are available at https://msc.berkeley.edu/research/insertion-primitives.html.
翻译:本文引入了一个离散的连续操作空间, 以学习机器人组装任务的原始元素。 原始是带有某些退出条件的一组基本行动, 比如“ 推下钉子直到接触 ” 。 由于原始是机器人控制指令的抽象和编码人类先前的知识, 它会减少探索难度并产生更好的学习效率。 在本文中, 我们通过原始学学习机器人组装技能。 具体地说, 我们以参数化行动的形式将原始元素作为参数化行动 : 由离散原始类型和连续原始参数组成的混合行动 。 与以前的工作相比, 我们方法中的代理可以使用一套离散参数从连续空间自由选择原始参数, 更灵活和高效地选择原始参数 。 由于原始是机器人控制命令的抽象, 我们建议双向多传跨的多通深Q- Network (TS- MP- DQQN), 这是一种高级版本的 MP- DQQN, 配有双向 Q- 网络, 以降低Q值的过度估计值。 在模拟和真实世界中进行广泛的实验。 从实验。 从实验。 从实验中, 我们的方法在精确的精度基准中, 直流流流流流流流到直流流流流流流流到直流流流流流流流流流流流流到直流到直流到直流到直流到直流到直流到直流到直流到流到流到流到流到流到流到流到直流到流到流到流到流到流到流。 。 。 。 。 。 。 。 级到直流到直流到直流到直流到直流到直流到流到流到直流到直流到直流到直流到直流到直流到直流到直流到直流到直流到直流到直流 。 。 。 。