There is an emerging trend of using neural implicit functions for map representation in Simultaneous Localization and Mapping (SLAM). Some pioneer works have achieved encouraging results on RGB-D SLAM. In this paper, we present a dense RGB SLAM method with neural implicit map representation. To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitate the implicit map decoder. This design effectively fuses shape cues across different scales to facilitate map reconstruction. Our method simultaneously solves the camera motion and the neural implicit map by matching the rendered and input video frames. To facilitate optimization, we further propose a photometric warping loss in the spirit of multi-view stereo to better constrain the camera pose and scene geometry. We evaluate our method on commonly used benchmarks and compare it with modern RGB and RGB-D SLAM systems. Our method achieves favorable results than previous methods and even surpasses some recent RGB-D SLAM methods. Our source code will be publicly available.
翻译:在同步本地化和绘图(SLAM)中,正在出现使用神经隐含功能代表地图的趋势。一些先驱工程在 RGB-D SLAM 上取得了令人鼓舞的成果。 在本文中,我们提出了一个密集的 RGB SLAM 方法,其中含有神经隐含映射图。为了在没有深度输入的情况下实现这项具有挑战性的目标,我们引入了一个等级特征卷,以便利隐含的地图解码器。这一设计有效地结合了不同尺度的形状提示,以便利地图重建。我们的方法同时通过匹配拍摄和输入的视频框来解决相机运动和神经隐含映射图。为了便于优化,我们进一步提议以多视立体精神进行光度测距扭曲损失,以更好地限制相机的布局和场景几何结构。我们评估了我们常用的基准方法,并将其与现代 RGB和RGB-D SLM 系统进行比较。我们的方法取得了比以往方法更好的效果,甚至超过最近一些 RGB-D SLM 方法。我们的源代码将公开使用。