We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation while estimating the current camera position in the scene. We incorporate the latest advances in Neural Radiance Fields (NeRF) into a SLAM system, resulting in an efficient and accurate dense visual SLAM method. Our scene representation consists of multi-scale axis-aligned perpendicular feature planes and shallow decoders that, for each point in the continuous space, decode the interpolated features into Truncated Signed Distance Field (TSDF) and RGB values. Our extensive experiments on three standard datasets, Replica, ScanNet, and TUM RGB-D show that ESLAM improves the accuracy of 3D reconstruction and camera localization of state-of-the-art dense visual SLAM methods by more than 50%, while it runs up to 10 times faster and does not require any pre-training.
翻译:我们提出了ESLAM, 一种高效的隐式神经表达方法,用于同时定位和建图 (SLAM)。ESLAM以顺序方式读取具有未知相机姿态的RGB-D帧,并在估计场景中当前相机位置的同时逐步重建场景表示。我们将神经辐射场 (NeRF) 的最新进展纳入到 SLAM 系统中,得到了一种高效而准确的稠密视觉 SLAM 方法。我们的场景表示由多尺度轴对齐垂直特征平面和浅解码器组成,对于连续空间中的每个点,将插值特征解码为截断有符号距离场 (TSDF) 和 RGB 值。我们对三个标准数据集,即 Replica、ScanNet 和 TUM RGB-D 进行了大量实验证明,ESLAM 将最先进的稠密视觉 SLAM 方法的 3D 重建和相机定位精度提高了 50%以上,同时运行速度提高了 10 倍,而且不需要进行任何预训练。