Place recognition is indispensable for a drift-free localization system. Due to the variations of the environment, place recognition using single-modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR. Specifically, we first build the elevation image generated from 3D points as a structural representation. Then, we derive the correspondences between 3D points and image pixels that are further used in merging the pixel-wise visual features into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic representation, namely CORAL. And the whole network is called CORAL-VLAD. Comparisons on the Oxford RobotCar show that CORAL-VLAD has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations on cross-city datasets.
翻译:位置识别对于无漂移本地化系统是不可或缺的。 由于环境的变化, 使用单一模式进行位置识别具有局限性 。 在本文中, 我们提出双式位置识别方法, 它可以从两种模式( 视觉和LiDAR) 中提取一个复合全球描述符。 具体地说, 我们首先将3D点生成的海拔图像建成结构图。 然后, 我们从3D点和图像像素之间得出对应性, 用于将像素视觉特征合并到海拔地图网格中。 这样, 我们就可以将结构特征和视觉特征结合到一致的鸟眼视图框架中, 产生语义代表, 即 CORAL- VLAD 。 牛津机器人汽车的比较显示, CORAL- VLAD 与其他最先进的方法相比, 我们的网络可以被推广到跨城市数据集上的其他场景和感官配置 。