Place recognition is an important technique for autonomous cars to achieve full autonomy since it can provide an initial guess to online localization algorithms. Although current methods based on images or point clouds have achieved satisfactory performance, localizing the images on a large-scale point cloud map remains a fairly unexplored problem. This cross-modal matching task is challenging due to the difficulty in extracting consistent descriptors from images and point clouds. In this paper, we propose the I2P-Rec method to solve the problem by transforming the cross-modal data into the same modality. Specifically, we leverage on the recent success of depth estimation networks to recover point clouds from images. We then project the point clouds into Bird's Eye View (BEV) images. Using the BEV image as an intermediate representation, we extract global features with a Convolutional Neural Network followed by a NetVLAD layer to perform matching. We evaluate our method on the KITTI dataset. The experimental results show that, with only a small set of training data, I2P-Rec can achieve a recall rate at Top-1 over 90\%. Also, it can generalize well to unknown environments, achieving recall rates at Top-1\% over 80\% and 90\%, when localizing monocular images and stereo images on point cloud maps, respectively.
翻译:位置识别是自主汽车实现完全自主的重要方法, 因为它可以为在线本地化算法提供初步的猜测。 虽然基于图像或点云的当前方法已经取得了令人满意的性能, 但大规模点云图图像本地化仍是一个相当未探索的问题。 由于难以从图像和点云中提取一致的描述符, 跨模式匹配任务具有挑战性。 我们在此文件中建议使用 I2P- Rec 方法, 将跨模式数据转换为同一模式来解决这个问题。 具体地说, 我们利用最近深度估计网络的成功来从图像中回收点云。 我们然后将点云投射到鸟视图像中。 使用 BEV 图像作为中间表示器, 我们提取全球特征, 由动态神经网络和 NetVLAD 层进行匹配。 我们评估了 KITTI 数据集中我们的方法。 实验结果显示, 只要有一套小的培训数据, I2P- Rec 就能在Top-1 + + 90 级地图上实现回溯回率, 80 和80 级图像, 级图像分别可以概括到未知环境。</s>