视觉场所识别识别的语义强化关注学习 (Semantic Reinforced Attention Learning for Visual Place Recognition)

Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task. In order to highlight the task-relevant visual cues in the feature embedding, the existing attention mechanisms are either based on artificial rules or trained in a thorough data-driven manner. To fill the gap between the two types, we propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning. The contribution lies in two-folds. (1) To suppress misleading local features, an interpretable local weighting scheme is proposed based on hierarchical feature distribution. (2) By exploiting the interpretability of the local weighting scheme, a semantic constrained initialization is proposed so that the local attention can be reinforced by semantic priors. Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.

翻译：大型视觉定位识别(VPR)具有内在的挑战性,因为图像中并非所有视觉提示都有利于这项任务。为了突出嵌入功能中与任务相关的视觉提示,现有的关注机制要么基于人为规则,要么以彻底的数据驱动方式培训。为了填补这两种类型的差距,我们提议建立一个新型的语义强化关注学习网络(SRALNet),在该网络中,引人注意既可以得益于语义前科,也可以得益于数据驱动的微调。其贡献在于两面。 (1) 为抑制误导性的本地特征,根据等级特征分布提出一个可解释的本地加权计划。 (2) 通过利用本地加权计划的可解释性,提出语义限制初始化,以便用语义前科加强本地的注意。实验表明,我们的方法在城市规模的VPR基准数据集方面超越了最新技术。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【快讯】KDD2020论文出炉，216篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年5月16日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日