Visual simultaneous localization and mapping (SLAM) systems face challenges in detecting loop closure under the circumstance of large viewpoint changes. In this paper, we present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Firstly, we propose an object-level data association approach based on the semantic information from semantic labels, intersection over union (IoU), object color, and object embedding. Subsequently, multi-view bundle adjustment with the associated objects is utilized to jointly optimize the poses of objects and cameras. We represent the refined objects as a 3D spatial graph with semantics and topology. Then, we propose a graph matching approach to select correspondence objects based on the structure layout and semantic property similarity of vertices' neighbors. Finally, we jointly optimize camera trajectories and object poses in an object-level pose graph optimization, which results in a globally consistent map. Experimental results demonstrate that our proposed data association approach can construct more accurate 3D semantic maps, and our loop closure method is more robust than point-based and object-based methods in circumstances with large viewpoint changes.
翻译:在大视角变化的情况下,视觉同时定位和建图(SLAM)系统面临着检测环路闭合的挑战。本文提出了一种基于3D场景图的对象级别环路闭合检测方法,该方法基于对象的空间布局和语义一致性。首先,我们提出了一种基于语义信息的对象级别数据关联方法,包括语义标签、IoU、对象颜色和对象嵌入。随后,利用关联的对象进行多视角捆绑调整,共同优化对象和相机的位姿。我们将调整后的对象表示为带有语义和拓扑的3D空间图。然后,我们提出了一种基于顶点邻居的结构布局和语义属性相似性选择对应对象的图匹配方法。最后,我们在对象级别的位姿图优化中联合优化相机轨迹和对象位姿,得到全局一致的地图。实验结果表明,我们提出的数据关联方法可以构建更准确的3D语义地图,且我们的环路闭合方法在具有大视角变化的情况下比基于点和基于对象的方法更为稳健。