Visual localization is a crucial component in the application of mobile robot and autonomous driving. Image retrieval is an efficient and effective technique in image-based localization methods. Due to the drastic variability of environmental conditions, e.g. illumination, seasonal and weather changes, retrieval-based visual localization is severely affected and becomes a challenging problem. In this work, a general architecture is first formulated probabilistically to extract domain-invariant feature through multi-domain image translation. And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. We also propose a new adaptive triplet loss to boost the metric learning of the embedding in a self-supervised manner. The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models without and with Grad-SAM loss. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset. The strong generalization ability of our approach is verified on RobotCar dataset using models pre-trained on urban part of CMU-Seasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision, especially under the challenging environments with illumination variance, vegetation and night-time images. Moreover, real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization.
翻译:视觉本地化是应用移动机器人和自主驱动的一个至关重要的组成部分。图像检索是基于图像本地化方法中的一种高效有效的精细本地化技术。由于环境条件的急剧变化,例如照明、季节性和天气变化,基于检索的视觉本地化受到严重影响,并成为一个具有挑战性的问题。在这项工作中,一个总体架构首先以概率方式制定,以便通过多功能图像翻译来提取域变异特征。然后,将一个新的梯度加权相似的启动映像丢失(Grad-SAM)纳入到基于图像的精细本地化方法中。我们还提出一个新的适应性三重损失,以强化以自我监督的方式嵌入环境的计量学习。最后的粗度到线图像检索管道作为模型的顺序组合实施,而不使用格拉德-SAM的图像转换。为了验证CMU-Searsons数据集中的拟议方法的有效性,我们方法的强大总体化能力得到了验证。在城市预先培训的模型中,甚至精度的本地的图像的精确性,我们的工作表现在真实的精确度环境中进行。