Fiducial markers can encode rich information about the environment and can aid Visual SLAM (VSLAM) approaches in reconstructing maps with practical semantic information. Current marker-based VSLAM approaches mainly utilize markers for improving feature detections in low-feature environments and/or for incorporating loop closure constraints, generating only low-level geometric maps of the environment prone to inaccuracies in complex environments. To bridge this gap, this paper presents a VSLAM approach utilizing a monocular camera along with fiducial markers to generate hierarchical representations of the environment while improving the camera pose estimate. The proposed approach detects semantic entities from the surroundings, including walls, corridors, and rooms encoded within markers, and appropriately adds topological constraints among them. Experimental results on a real-world dataset collected with a robot demonstrate that the proposed approach outperforms a traditional marker-based VSLAM baseline in terms of accuracy, given the addition of new constraints while creating enhanced map representations. Furthermore, it shows satisfactory results when comparing the reconstructed map quality to the one reconstructed using a LiDAR SLAM approach.
翻译:摘要:基于标记的SLAM可以编码关于环境的丰富信息,并且可以帮助视觉SLAM方法重建具有实际语义信息的地图。当前的基于标记的SLAM方法主要利用标记来改善在低特征环境中的特征检测和/或合并循环闭合约束,产生仅具有低级几何环境且容易在复杂环境中出现不准确的地图。为弥补这一缺口,本文提出了一种利用单目摄像头和标记来生成环境的分层表示和改善相机姿态估计的VSLAM方法。所提出的方法检测来自周围环境中的语义实体,包括通过标记编码的墙、走廊和房间,并在它们之间适当地添加拓扑约束。使用机器人收集的实际数据集进行的实验结果表明,所提出的方法在加入新约束并创建了增强地图表示时,在精度方面优于传统的基于标记的VSLAM基线。此外,与使用LiDAR SLAM方法重建的地图相比,它显示出令人满意的结果。