Visual localization plays an important role for intelligent robots and autonomous driving, especially when the accuracy of GNSS is unreliable. Recently, camera localization in LiDAR maps has attracted more and more attention for its low cost and potential robustness to illumination and weather changes. However, the commonly used pinhole camera has a narrow Field-of-View, thus leading to limited information compared with the omni-directional LiDAR data. To overcome this limitation, we focus on correlating the information of 360 equirectangular images to point clouds, proposing an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space. Inspired by the attention mechanism, we optimize the network to capture the salient feature for comparing images and point clouds. We construct several sequences containing 360 equirectangular images and corresponding point clouds based on the KITTI-360 dataset and conduct extensive experiments. The results demonstrate the effectiveness of our approach.
翻译:视觉定位对智能机器人和自主驱动起着重要作用,特别是在全球导航卫星系统的准确性不可靠的情况下。最近,LiDAR地图中的相机定位吸引了越来越多的关注,因为其成本低,对照明和天气变化具有潜在的坚固性。然而,通常使用的针孔相机的视野狭窄,因此与全向激光定位数据相比,信息有限。为了克服这一限制,我们侧重于将360个等宽三角图像的信息与指向云层的信息联系起来,建议建立一个端到端可学习的网络,通过在高维地貌空间建立相似性来进行跨现代视觉本地化。我们受关注机制的启发,优化了网络,以捕捉图像和点云比较的突出特征。我们根据KITTI-360数据集构建了若干包含360个等宽方图像和相应的点云的序列,并进行了广泛的实验。结果显示了我们的方法的有效性。