Visual simultaneous localization and mapping (SLAM) systems face challenges in detecting loop closure under the circumstance of large viewpoint changes. In this paper, we present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Firstly, we propose an object-level data association approach based on the semantic information from semantic labels, intersection over union (IoU), object color, and object embedding. Subsequently, multi-view bundle adjustment with the associated objects is utilized to jointly optimize the poses of objects and cameras. We represent the refined objects as a 3D spatial graph with semantics and topology. Then, we propose a graph matching approach to select correspondence objects based on the structure layout and semantic property similarity of vertices' neighbors. Finally, we jointly optimize camera trajectories and object poses in an object-level pose graph optimization, which results in a globally consistent map. Experimental results demonstrate that our proposed data association approach can construct more accurate 3D semantic maps, and our loop closure method is more robust than point-based and object-based methods in circumstances with large viewpoint changes.
翻译:视觉同时定位与地图构建(SLAM)系统在面对大视角变化下的环路闭合检测问题时面临着挑战。本文提出了一种基于空间布局和3D场景图语义一致性的物体级别环路闭合检测方法。首先,我们提出了一种基于来自语义标签、交叉联合率(IoU)、物体颜色和物体嵌入的语义信息的物体级别数据关联方法。随后,利用关联的物体进行多视角捆绑调整,共同优化物体和相机的位姿。我们将经过优化的物体表示为具有语义和拓扑的3D空间图。然后,我们提出了一种基于顶点邻居的结构布局和语义属性相似性选择对应物体的图匹配方法。最后,我们在物体级别的位姿图优化中共同优化相机轨迹和物体位姿,从而得到全局一致的地图。实验结果表明,我们提出的数据关联方法能够构建更精确的3D语义地图,而且我们的环路闭合方法在大视角变化情况下比基于点或基于物体的方法更加鲁棒。