Dextrous in-hand manipulation with a multi-fingered robotic hand is a challenging task, esp. when performed with the hand oriented upside down, demanding permanent force-closure, and when no external sensors are used. For the task of reorienting an object to a given goal orientation (vs. infinitely spinning it around an axis), the lack of external sensors is an additional fundamental challenge as the state of the object has to be estimated all the time, e.g., to detect when the goal is reached. In this paper, we show that the task of reorienting a cube to any of the 24 possible goal orientations in a ${\pi}$/2-raster using the torque-controlled DLR-Hand II is possible. The task is learned in simulation using a modular deep reinforcement learning architecture: the actual policy has only a small observation time window of 0.5s but gets the cube state as an explicit input which is estimated via a deep differentiable particle filter trained on data generated by running the policy. In simulation, we reach a success rate of 92% while applying significant domain randomization. Via zero-shot Sim2Real-transfer on the real robotic system, all 24 goal orientations can be reached with a high success rate.
翻译:多手指机器人手的外在操纵是另一项艰巨的任务,例如,在用手向下倒转操作时,需要永久的武力封闭,而且没有使用外部传感器。将一个对象调整到特定目标方向的任务(V.无限地围绕轴旋转),缺少外部传感器是一个额外的根本性挑战,因为必须始终对物体的状况进行估计,例如,在达到目标时进行检测。在本文中,我们表明,利用托克控制的 DLR-H和II 将立方体调整到24种可能的目标方向中的任何一种,即$=pi}$/2-raster,使用托克控制的 DLR-Hand II 。在模拟中学习到的是一个模块化的深度强化学习结构:实际政策只有0.5的小型观察时间窗口,但获得立方状态作为明确的输入,而这种输入是通过在操作该政策产生的数据时经过培训的深不同的粒子过滤器来估计的。在模拟中,我们达到92%的成功率,同时应用显著的域随机性随机性调整。Via 零-shot Sim2Real 成功率是24 目标系统上的所有成功率。</s>