We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image space. In this paper, we devise a new grasp generation network that reduces the dependency on precise keypoint estimation. Given an RGB-D input, our network estimates both the grasp pose from keypoint detection as well as scale towards the camera. We further re-design the keypoint output space in order to mitigate the negative impact of keypoint prediction noise to Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method outperforms the baseline by a large margin, validating the efficacy of our approach. Finally, despite trained on simple synthetic objects, our method demonstrate sim-to-real capacity by showing competitive results in real-world robot experiments.
翻译:我们从基于关键点的 2D/2.5D 输入中提出一个新的 6-DoF 抓取组合法。 图像输入中基于关键点的抓取检测器在前一次研究中显示了令人乐观的结果, 彩色图像提供的额外视觉信息弥补了感触深度的感知。 但是, 它在很大程度上依赖于对图像空间中关键点位置的准确预测。 在本文中, 我们设计了一个新的抓取生成网络, 减少对精确关键点估计的依赖。 根据 RGB- D 输入, 我们的网络估算了从关键点检测和比例到摄像头的抓取作用。 我们进一步重新设计了关键点输出空间, 以减轻关键点预测噪音对视觉- 点( PnP) 算法的负面影响。 实验显示, 拟议的方法大大超越了基线, 证实了我们的方法的有效性。 最后, 尽管在简单的合成物体上受过培训, 我们的方法通过在真实世界机器人实验中显示竞争性的结果, 展示了模拟到真实的能力。</s>