Neural Radiance Fields (NeRFs) have been successfully used for scene representation. Recent works have also developed robotic navigation and manipulation systems using NeRF-based environment representations. As object localization is the foundation for many robotic applications, to further unleash the potential of NeRFs in robotic systems, we study object localization within a NeRF scene. We propose a transformer-based framework NeRF-Loc to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input, and produces labeled 3D bounding boxes of objects as output. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with the conventional transformer-based method and our method achieves better performance. In addition, we also present the first NeRF samples-based object localization benchmark NeRFLocBench.
翻译:近日工程还开发了机器人导航和操纵系统,使用NeRF环境图案。由于物体定位是许多机器人应用的基础,以进一步释放NeRF系统的潜力,我们研究NeRF场景中的物体定位。我们提议了一个基于变压器的框架NeRF-Loc,以提取NeRF场景中的三维物体框。NeRF-Loc采用预先训练的NERF模型和相机视图作为输入,并制作了标为3D的物体框作为输出。具体地说,我们设计了一对平行的变压器编码器编码分支,即粗流和细流,以编码目标物体的上下文和细节。然后,编码的特性与关注层结合在一起,以减轻精确物体定位的模糊性。我们比较了我们的方法与常规变压器基方法,我们的方法取得了更好的性。此外,我们还介绍了首个以NERF样品为基础的物体本地化基准 NeRFLocBench。