以回收为基础的长期视觉本地化 (Domain-invariant Similarity Activation Map Contrastive Learning for Retrieval-based Long-term Visual Localization)

Visual localization is a crucial component in the application of mobile robot and autonomous driving. Image retrieval is an efficient and effective technique in image-based localization methods. Due to the drastic variability of environmental conditions, e.g. illumination, seasonal and weather changes, retrieval-based visual localization is severely affected and becomes a challenging problem. In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation. And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. We also propose a new adaptive triplet loss to boost the contrastive learning of the embedding in a self-supervised manner. The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models without and with Grad-SAM loss. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset. The strong generalization ability of our approach is verified on RobotCar dataset using models pre-trained on urban part of CMU-Seasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision, especially under the challenging environments with illumination variance, vegetation and night-time images. The code and pretrained models are available on https://github.com/HanjiangHu/DISAM.

翻译：视觉本地化是应用移动机器人和自主驱动中的一个关键组成部分。图像检索是基于图像本地化方法中的一种高效而有效的精细本地化技术。由于环境条件的急剧变化,例如光化、季节性和天气变化,基于检索的视觉本地化受到严重影响,并成为一个具有挑战性的问题。在这项工作中,一个总体架构首先以概率方式制定,以便通过多面图像翻译来提取域变异特性。然后将一个新的梯度加权相似性启动映像损失(Grad-SAM)纳入到基于图像的本地化精细化方法中。我们还提出一种新的适应性三重损失,以促进以自我监督的方式嵌入环境的对比性学习。最后的全面图象检索管道是作为模型的顺序组合而实施的,而没有并伴随着Grad-SAM损失。已经进行了广泛的实验,以验证CMUSearsons数据集中的拟议方法的有效性。我们方法的强大总体化能力是使用CMU-SASARS-SARCR 模型在城市部分预先训练的模型、甚至卡纳-SAIS-SALson 图像在中等级的精确度环境下或高端图像下,我们的表现表现是具有具有挑战性的。在高度的精确级的地面/直径级的图像环境下进行。我们的工作表现。在具有高基级的地面/直级的地面/直观和高基的状态。我们地标的状态是在高基的状态下,在CMA-SLAMA-SAL-SAL-SAL-SAL-SAL-SD-SDRD-SD-SD-SL-SDR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S