Place recognition is indispensable for drift-free localization system. Due to the variations of the environment, place recognition using single modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract compound global descriptor from the two modalities, vision and LiDAR. Specifically, we build elevation image generated from point cloud modality as a discriminative structural representation. Based on the 3D information, we derive the correspondences between 3D points and image pixels, by which the pixel-wise visual features can be inserted into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic feature representation with sensible geometry, namely CORAL. Comparisons on the Oxford RobotCar show that CORAL has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations using cross-city datasets.
翻译:位置识别对于无漂移本地化系统是不可或缺的。 由于环境的变化, 使用单一模式进行位置识别具有局限性 。 在本文中, 我们提出双式位置识别方法, 它可以从两种模式、 视觉和 LiDAR 中提取复合全球描述符。 具体地说, 我们从点云模式中建立高端图像, 这是一种有区别的结构代表。 基于 3D 信息, 我们从 3D 点和图像像素 之间得出对应关系, 从而将像素视觉特征插入海拔地图网格中。 这样, 我们就可以将结构特征和视觉特征结合到一致的鸟眼视图框架中, 产生具有合理几何特征的语义特征代表, 即 CORAL 。 牛津机器人汽车的比较显示, CORAL 与其他最先进的方法相比表现优。 我们还表明, 我们的网络可以使用跨城市数据集, 推广到其他场景和感官配置 。