Reward functions are at the heart of every reinforcement learning (RL) algorithm. In robotic grasping, rewards are often complex and manually engineered functions that do not rely on well-justified physical models from grasp analysis. This work demonstrates that analytic grasp stability metrics constitute powerful optimization objectives for RL algorithms that refine grasps on a three-fingered hand using only tactile and joint position information. We outperform a binary-reward baseline by 42.9% and find that a combination of geometric and force-agnostic grasp stability metrics yields the highest average success rates of 95.4% for cuboids, 93.1% for cylinders, and 62.3% for spheres across wrist position errors between 0 and 7 centimeters and rotational errors between 0 and 14 degrees. In a second experiment, we show that grasp refinement algorithms trained with contact feedback (contact positions, normals, and forces) perform up to 6.6% better than a baseline that receives no tactile information.
翻译:奖励是每个强化学习( RL) 算法的核心。 在机器人掌握中, 奖赏往往是复杂和人工设计的功能, 不依赖于从掌握的分析中合理物理模型。 这项工作表明, 分析的掌握稳定性度量是RL算法的强大优化目标, 该算法只使用触觉和联合位置信息来完善三指手的握头。 我们的二进制评分基线比二进制评分基准高出42.9%, 并发现几何和强力掌握稳定性指标的结合, 使幼崽的平均成功率达到95.4%, 气瓶为93.1%, 手腕位置错误在0到7厘米之间, 旋转错误在0到14度之间, 62.3%。 在第二个实验中, 我们显示, 利用接触反馈( 接触位置、 正常状态和力量) 训练的精细化算算法比没有触觉信息的基线要好到6.6%。