In this paper, we introduce a Grasp Manifold Estimator (GraspME) to detect grasp affordances for objects directly in 2D camera images. To perform manipulation tasks autonomously it is crucial for robots to have such graspability models of the surrounding objects. Grasp manifolds have the advantage of providing continuously infinitely many grasps, which is not the case when using other grasp representations such as predefined grasp points. For instance, this property can be leveraged in motion optimization to define goal sets as implicit surface constraints in the robot configuration space. In this work, we restrict ourselves to the case of estimating possible end-effector positions directly from 2D camera images. To this extend, we define grasp manifolds via a set of key points and locate them in images using a Mask R-CNN backbone. Using learned features allows generalizing to different view angles, with potentially noisy images, and objects that were not part of the training set. We rely on simulation data only and perform experiments on simple and complex objects, including unseen ones. Our framework achieves an inference speed of 11.5 fps on a GPU, an average precision for keypoint estimation of 94.5% and a mean pixel distance of only 1.29. This shows that we can estimate the objects very well via bounding boxes and segmentation masks as well as approximate the correct grasp manifold's keypoint coordinates.
翻译:在本文中, 我们引入了一个 Grasp manify Estimator (GraspME), 以检测 2D 相机图像中直接对象的抓取价格。 要自动执行操作任务, 机器人必须拥有周围对象的可捕捉模型。 抓取元件的优点是提供无限多的抓取模型, 而使用其他抓取表达方式, 如预定义的抓取点, 情况并非如此 。 例如, 该属性可以在运动优化中被利用, 以定义机器人配置空间中的目标数据集为隐含的表面限制 。 在这项工作中, 我们仅限于直接从 2D 相机图像中估算可能的终端效应或位置。 要进行此扩展, 我们通过一组关键点的一组关键点定位点来定义方位, 并使用mask- CNN 的骨干来定位这些元件 。 使用所学的特性可以将不同角度加以概括, 包括潜在的杂乱图像, 以及不属于训练组的物体。 我们只依靠模拟数据, 并对简单和复杂的物体进行实验, 包括看不见的物体。 我们的框架在 GPPPPPU 上取得了11.5 平均精度的精确度, 。