交叉领域视觉地点识别真的只需要局部特征吗？ (Are Local Features All You Need for Cross-Domain Visual Place Recognition?)

Visual Place Recognition is a task that aims to predict the coordinates of an image (called query) based solely on visual clues. Most commonly, a retrieval approach is adopted, where the query is matched to the most similar images from a large database of geotagged photos, using learned global descriptors. Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods. Examples are heavy illumination changes (e.g. night-time images) or substantial occlusions (e.g. transient objects). In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges, following the intuition that local descriptors are inherently more robust than global features to domain shifts. To this end, we provide a new, comprehensive benchmark on current state of the art models. We also introduce two new demanding datasets with night and occluded queries, to be matched against a city-wide database. Code and datasets are available at https://github.com/gbarbarani/re-ranking-for-VPR.

翻译：视觉地点识别是一项任务，旨在仅基于视觉线索预测图像（称为查询）的坐标。最常用的方法是采用召回方法，其中利用学习的全局描述符将查询与大型地理标记照片数据库中最相似的图像进行匹配。尽管最近取得了进展，但在查询来自完全不同的分布时识别相同的位置仍然是最先进的召回方法的主要障碍。例如，严重的照明变化（例如夜间图像）或实质性的遮挡（例如瞬态对象）。在这项工作中，我们探讨了基于空间验证的重新排序方法是否能够应对这些挑战，因为局部描述符在面对领域转移时固有的更加稳健。为此，我们提供了当前最先进模型的新的全面基准测试。我们还引入两个具有夜间和遮挡查询的新难题数据集，以与城市范围的数据库匹配。代码和数据集可在https://github.com/gbarbarani/re-ranking-for-VPR获取。