Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task. In order to highlight the task-relevant visual cues in the feature embedding, the existing attention mechanisms are either based on artificial rules or trained in a thorough data-driven manner. To fill the gap between the two types, we propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning. The contribution lies in two-folds. (1) To suppress misleading local features, an interpretable local weighting scheme is proposed based on hierarchical feature distribution. (2) By exploiting the interpretability of the local weighting scheme, a semantic constrained initialization is proposed so that the local attention can be reinforced by semantic priors. Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.
翻译:大型视觉定位识别(VPR)具有内在的挑战性,因为图像中并非所有视觉提示都有利于这项任务。为了突出嵌入功能中与任务相关的视觉提示,现有的关注机制要么基于人为规则,要么以彻底的数据驱动方式培训。为了填补这两种类型的差距,我们提议建立一个新型的语义强化关注学习网络(SRALNet),在该网络中,引人注意既可以得益于语义前科,也可以得益于数据驱动的微调。其贡献在于两面。 (1) 为抑制误导性的本地特征,根据等级特征分布提出一个可解释的本地加权计划。 (2) 通过利用本地加权计划的可解释性,提出语义限制初始化,以便用语义前科加强本地的注意。实验表明,我们的方法在城市规模的VPR基准数据集方面超越了最新技术。