In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry. We, for the first time, propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to achieve material-agnostic object grasping in clutter. Compared to the existing NeRF-based 3-DoF grasp detection methods that rely on densely captured input images and time-consuming per-scene optimization, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time. The proposed framework jointly learns generalizable NeRF and grasp detection in an end-to-end manner, optimizing the scene representation construction for the grasping. For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes that enables direct transfer to the real world. Our extensive experiments in synthetic and real-world environments demonstrate that our method significantly outperforms all the baselines in all the experiments while remaining in real-time. Project page can be found at https://pku-epic.github.io/GraspNeRF
翻译:在这项工作中,我们首次提出一个基于RGB 6-DoF 的多视图 6-DoF 抓取探测网,利用通用神经光亮场(NERF) 实现材料-不可知的捕捉。与现有的基于 NERF 的3-DoF 抓取探测方法相比,我们系统可以进行零光 NERF 的构造,其输入图像密度高,每秒精度精度优化耗时。我们的系统可以使用少量RGB 输入,可靠地实时探测6-DoF 抓取。拟议框架共同学习通用NERF,以端到端的方式捕捉捉到探测,优化抓取的场面标示结构。在培训数据方面,我们制作了一个大型的光现实化域-随机合成数据集,在结晶式桌面镜中捕捉,能够直接传输到真实的RGB 。在现实世界中,我们所有的合成基准和常规实验都展示了真实的模型。</s>