Grasping unseen objects in unconstrained, cluttered environments is an essential skill for autonomous robotic manipulation. Despite recent progress in full 6-DoF grasp learning, existing approaches often consist of complex sequential pipelines that possess several potential failure points and run-times unsuitable for closed-loop grasping. Therefore, we propose an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene. Our novel grasp representation treats 3D points of the recorded point cloud as potential grasp contacts. By rooting the full 6-DoF grasp pose and width in the observed point cloud, we can reduce the dimensionality of our grasp representation to 4-DoF which greatly facilitates the learning process. Our class-agnostic approach is trained on 17 million simulated grasps and generalizes well to real world sensor data. In a robotic grasping study of unseen objects in structured clutter we achieve over 90% success rate, cutting the failure rate in half compared to a recent state-of-the-art method.
翻译:在不受限制、杂乱无章的环境中研磨看不见的物体,是自主机器人操纵的基本技能。尽管最近在6-DoF全套掌握式学习方面取得了进展,但现有方法往往包括复杂的连续管道,这些管道具有若干潜在的故障点和不适于闭环捕捉的运行时间。因此,我们建议建立一个端对端网络,通过对场景的深度记录,直接有效生成6-DoF的平行抓网的分布。我们的新握式代表将记录点云的3D点作为潜在的抓取式接触点。通过将6-DoF全套掌握式和宽度扎根在所观测到的云中,我们可以将我们握式代表的维度降低到4-DoF,这大大便利了学习过程。我们班级的手法方法是1 700万个模拟抓网,并非常概括到真实的世界感官数据。在对结构布满的云层中,我们实现了90%以上的成功率,将一半的失败率降低到最近的状态方法。