We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image space. In this paper, we devise a new grasp generation network that reduces the dependency on precise keypoint estimation. Given an RGB-D input, our network estimates both the grasp pose from keypoint detection as well as scale towards the camera. We further re-design the keypoint output space in order to mitigate the negative impact of keypoint prediction noise to Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method outperforms the baseline by a large margin, validating the efficacy of our approach. Finally, despite trained on simple synthetic objects, our method demonstrate sim-to-real capacity by showing competitive results in real-world robot experiments.
翻译:我们提出了一种基于关键点的2D/2.5D输入的六自由度抓取姿态合成方法。之前的研究表明,基于图像输入的关键点抓取探测器在提供的额外视觉信息补偿了深度感知的噪声后,表现出了很好的效果。然而,它严重依赖于准确预测关键点在图像空间中的位置。本文提出了一种新的抓取生成网络,减少对精确关键点估计的依赖。给定RGB-D输入,我们的网络估计从关键点检测到的抓取姿态以及朝向相机的尺度。我们进一步重新设计了关键点输出空间,以减轻关键点预测噪声对透视n点(PnP)算法的负面影响。实验证明,所提出的方法在大幅度上优于基线,验证了我们的方法的有效性。最后,尽管在简单的合成对象上进行了训练,我们的方法在真实世界的机器人实验中表现出了模拟到真实的能力。