Neural implicit representations have recently become popular in simultaneous localization and mapping (SLAM), especially in dense visual SLAM. However, previous works in this direction either rely on RGB-D sensors, or require a separate monocular SLAM approach for camera tracking and do not produce high-fidelity dense 3D scene reconstruction. In this paper, we present NICER-SLAM, a dense RGB SLAM system that simultaneously optimizes for camera poses and a hierarchical neural implicit map representation, which also allows for high-quality novel view synthesis. To facilitate the optimization process for mapping, we integrate additional supervision signals including easy-to-obtain monocular geometric cues and optical flow, and also introduce a simple warping loss to further enforce geometry consistency. Moreover, to further boost performance in complicated indoor scenes, we also propose a local adaptive transformation from signed distance functions (SDFs) to density in the volume rendering equation. On both synthetic and real-world datasets we demonstrate strong performance in dense mapping, tracking, and novel view synthesis, even competitive with recent RGB-D SLAM systems.
翻译:最近,在同步定位和绘图(SLAM)中,特别是在密集的视觉SLAM中,神经隐含的表示方式最近成为流行,特别是在密集的视觉SLM中。然而,以前朝这一方向开展的工作要么依靠RGB-D传感器,要么需要单独单方SLM系统来跟踪相机,而不是进行高度忠诚密集的三维场景重建。在本文中,我们介绍了NICER-SLAM系统,这是一个密集的RGB SLAM系统,该系统同时优化照相机的配置,以及一个等级级神经隐含的地图表示方式,它也允许高质量的新观点合成。为便利绘图的优化进程,我们整合了更多的监督信号,包括易于观测的单眼几何几何线信号和光学流,还引入了简单的扭曲损失,以进一步加强几何一致性。此外,为了进一步提高复杂的室内场景的性能,我们还提议从签名的距离功能(SDFS)到音量中密度的本地适应性转变。关于合成和真实世界数据集,我们展示了密集绘图、跟踪和新观点合成综合的强大性,甚至与最近的RGB-D SLM系统具有竞争力。