A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their components. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with implicit neural representation (NeRF) and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. We validate its effectiveness on various challenging benchmarks. Results show that our SLAM is up to 800x faster than the strong baseline with superior rendering outcomes.
翻译:为实现这一目标,我们需要一个视觉的SLAM, 便于在不事先培训的情况下适应新的场景,并生成用于实时下游任务的密集地图。以前的学习和非学习的SLMS没有满足所有需求,因为其组成部分的内在局限性。在这项工作中,我们开发了一个视觉的SLM,名为Orbeez-SLAM,它与隐性神经代表(NeRF)和视觉观察仪成功合作,以实现我们的目标。此外,Orbeez-SLAM可以使用单镜相机,因为它只需要RGB投入,使其广泛适用于现实世界。我们验证它在各种具有挑战性的基准上的有效性。结果显示,我们的SLMM比强的基线更快800倍,而且效果更高。