6-DoF robotic grasping is a long-lasting but unsolved problem. Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors, demonstrating superior accuracy on common objects but perform unsatisfactorily on photometrically challenging objects, e.g., objects in transparent or reflective materials. The bottleneck lies in that the surface of these objects can not reflect back accurate depth due to the absorption or refraction of light. In this paper, in contrast to exploiting the inaccurate depth data, we propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet that utilizes stable 2D features to simultaneously handle arbitrary object grasping and overcome the problems induced by photometrically challenging objects. MonoGraspNet leverages keypoint heatmap and normal map to recover the 6-DoF grasping poses represented by our novel representation parameterized with 2D keypoints with corresponding depth, grasping direction, grasping width, and angle. Extensive experiments in real scenes demonstrate that our method can achieve competitive results in grasping common objects and surpass the depth-based competitor by a large margin in grasping photometrically challenging objects. To further stimulate robotic manipulation research, we additionally annotate and open-source a multi-view and multi-scene real-world grasping dataset, containing 120 objects of mixed photometric complexity with 20M accurate grasping labels.
翻译:6- DoF 机器人抓取是一个长期而尚未解决的问题。 最近的方法利用强大的 3D 网络从深度传感器中提取几何抓取表示, 显示通用物体的精度较高, 但对具有光度挑战性的物体, 例如透明或反射材料中的物体, 却表现不尽满意。 瓶颈在于这些天体的表面由于光的吸收或折射而不能反映准确的深度。 在本文中, 与利用不准确的深度数据相比, 我们提议的第一个 RGB 仅6- DoF 抓取管道名为 MonoGraspNet, 利用稳定的 2D 功能同时处理任意抓取物体和克服由具有光度挑战性的物体引发的问题。 Mono GraspNet 利用关键点的热映图和普通地图, 以我们的2D 基点为代表的新代表定位参数, 其深度相应、 掌握方向、 把握宽度和角度。 在真实场景中进行的广泛实验表明, 我们的方法可以在捕捉取共同物体, 超越深度比对等的复杂比对等的多直径, 。