关闭环圈:统一语义对象的图形网络和多物体场景的视觉特征 (Closing the Loop: Graph Networks to Unify Semantic Objects and Visual Features for Multi-object Scenes)

In Simultaneous Localization and Mapping (SLAM), Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places. Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems. It uses a set of visual features to provide robust place recognition but fails to perceive the semantics or spatial relationship between feature points. Previous work has mainly focused on addressing these issues by combining vBoW with semantic and spatial information from objects in the scene. However, they are unable to exploit spatial information of local visual features and lack a structure that unifies semantic objects and visual features, therefore limiting the symbiosis between the two components. This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically. Our novel graph-based LCD system utilizes the unified graph structure by applying a Weisfeiler-Lehman graph kernel with temporal constraints to robustly predict loop closure candidates. Evaluation of the proposed system shows that having a unified graph structure incorporating semantic objects and visual features improves LCD prediction accuracy, illustrating that the proposed graph structure provides a strong symbiosis between these two complementary components. It also outperforms other Machine Learning algorithms - such as SVM, Decision Tree, Random Forest, Neural Network and GNN based Graph Matching Networks. Furthermore, it has shown good performance in detecting loop closure candidates earlier than state-of-the-art SLAM systems, demonstrating that extended semantic and spatial awareness from the unified graph structure significantly impacts LCD performance.

翻译：在同步本地化和映射( SLAM) 中, Loop 封闭检测( LCD) 在识别先前访问过的位置时,对于最大限度地减少漂移至关重要。视觉软件包Words (vBoW) 是许多最先进的 SLAM 系统选择的LCD算法。它使用一套视觉功能来提供稳健的位置识别, 但却没有看到特征点之间的语义或空间关系。我们以前的工作主要侧重于解决这些问题, 将 vBoW 与现场物体的语义和空间信息结合起来。但是, 它们无法利用本地视觉特征的空间信息, 并且缺乏一个能够将早期语义对象和视觉特征统一起来的结构, 从而限制这两个组成部分之间的交错。本文提出 SymbioLCD 2, 创建一套统一的图形结构, 整合语义对象和视觉特征。我们新的图形LCD系统利用统一的图形结构, 将 Weisfeler- Lehman 图形内核内核内核内核内核内核内核内核内存, 内核内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内,,, 所拟议的内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内,,, 内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内,, 内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存内存