In this paper, we present a tightly-coupled visual-inertial object-level multi-instance dynamic SLAM system. Even in extremely dynamic scenes, it can robustly optimise for the camera pose, velocity, IMU biases and build a dense 3D reconstruction object-level map of the environment. Our system can robustly track and reconstruct the geometries of arbitrary objects, their semantics and motion by incrementally fusing associated colour, depth, semantic, and foreground object probabilities into each object model thanks to its robust sensor and object tracking. In addition, when an object is lost or moved outside the camera field of view, our system can reliably recover its pose upon re-observation. We demonstrate the robustness and accuracy of our method by quantitatively and qualitatively testing it in real-world data sequences.
翻译:在本文中,我们展示了一个紧密结合的视觉-自然物体级多内在动态SLMM系统。即使在极具动态的场景中,它也可以对摄像头的外形、速度、IMU偏向进行优化,并建立一个密集的三维重建环境目标级地图。我们的系统可以通过在现实世界数据序列中进行定量和定性测试,对任意物体的地形、其语义和运动进行强有力的跟踪和重新构造,将相关颜色、深度、语义和前地物体的概率逐渐地混合到每个物体模型中。此外,当一个物体丢失或移动到摄像场外时,我们的系统可以在重新观察时可靠地恢复其外形。我们通过在现实世界数据序列中对其进行定量和定性测试来显示我们方法的稳健性和准确性。