In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry. We, for the first time, propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to achieve material-agnostic object grasping in clutter. Compared to the existing NeRF-based 3-DoF grasp detection methods that rely on densely captured input images and time-consuming per-scene optimization, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time. The proposed framework jointly learns generalizable NeRF and grasp detection in an end-to-end manner, optimizing the scene representation construction for the grasping. For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes that enables direct transfer to the real world. Our extensive experiments in synthetic and real-world environments demonstrate that our method significantly outperforms all the baselines in all the experiments while remaining in real-time.
翻译:在这项工作中,我们首次提出一个基于RGB 6-DOF 的多视图 6-DOF 捕捉网络,利用一般神经光亮场(NERF) 实现材料-不可知的捕捉。 与现有的依靠密集捕获的投入图像和耗时的每色优化的基于视觉的3-DOF 捕捉方法相比,我们的系统可以进行零光 NERF 的构造,使用稀少的RGB 输入和可靠地探测6-DOF 捕捉,两者都是实时的。拟议框架共同学习通用 NERF 和以端到端的方式捕捉探测,优化用于捕捉的现场显示结构。在培训数据方面,我们制作了一个大规模光现实的3-调合合成数据集,在紧凑的桌面场景中捕捉,以便能够在现实环境中进行直接转移,同时在现实环境中展示我们所有的大规模合成实验。