Visual Place Recognition (VPR) approaches have typically attempted to match places by identifying visual cues, image regions or landmarks that have high ``utility'' in identifying a specific place. But this concept of utility is not singular - rather it can take a range of forms. In this paper, we present a novel approach to deduce two key types of utility for VPR: the utility of visual cues `specific' to an environment, and to a particular place. We employ contrastive learning principles to estimate both the environment- and place-specific utility of Vector of Locally Aggregated Descriptors (VLAD) clusters in an unsupervised manner, which is then used to guide local feature matching through keypoint selection. By combining these two utility measures, our approach achieves state-of-the-art performance on three challenging benchmark datasets, while simultaneously reducing the required storage and compute time. We provide further analysis demonstrating that unsupervised cluster selection results in semantically meaningful results, that finer grained categorization often has higher utility for VPR than high level semantic categorization (e.g. building, road), and characterise how these two utility measures vary across different places and environments. Source code is made publicly available at https://github.com/Nik-V9/HEAPUtil.
翻译:视觉位置识别( VPR) 方法通常试图通过识别视觉提示、图像区域或具有高“ 实用性” 的标志来匹配位置。 但实用性的概念并不是单一的, 而是可以采取一系列形式。 在本文中, 我们提出了一个新颖的方法来推断 VPR 的两种关键实用性: 视觉提示“ 特定” 对环境和特定地点的实用性。 我们采用了对比性学习原则, 以不受监督的方式来估计本地集集集解( VLAD) 矢量的矢量( VLAD) 组合的环境和特定地点的实用性, 后者随后被用来通过关键点选择来指导本地特征匹配。 通过将这两种实用性措施结合起来, 我们的方法在三个具有挑战性的基准数据集上实现了最先进的业绩, 同时减少了所需的存储和计算时间。 我们提供了进一步的分析, 表明在语义上有意义的结果中, 细化的分类对于 VPR9/ 高等级分类( e.g. building, 路 ) 和字符化的两种工具环境是如何在公共/ 源码上变化的。