We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation while estimating the current camera position in the scene. We incorporate the latest advances in Neural Radiance Fields (NeRF) into a SLAM system, resulting in an efficient and accurate dense visual SLAM method. Our scene representation consists of multi-scale axis-aligned perpendicular feature planes and shallow decoders that, for each point in the continuous space, decode the interpolated features into Truncated Signed Distance Field (TSDF) and RGB values. Our extensive experiments on two standard and recent datasets, Replica and ScanNet, show that ESLAM improves the accuracy of 3D reconstruction and camera localization of state-of-the-art dense visual SLAM methods by more than 50%, while it runs up to $\times$10 faster and does not require any pre-training.
翻译:我们把神经辐射场的最新进展纳入一个SLAM系统,从而形成一个高效和准确的密集视觉SLAM方法。我们的现场表现包括多尺度轴心垂直侧侧侧侧侧侧侧侧侧形和浅分解器,在连续空间的每一点,将内插特征破解成交错的近距离场(TSDF)和RGB值。我们在两个标准和最新数据集(Replical和ScanNet)上进行的广泛实验显示,ESLAM提高了3D重建的准确性,使最先进的密集视觉SLAM方法的摄像定位率提高了50%以上,而该方法的速率高达10美元,不需要任何预先训练。