In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appearance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic-world dataset suited for both place recognition and segmentation tasks.
翻译:在本文中,我们讨论了视觉位置识别(VPR)的任务,其目标是在巨大的地理标记画廊上检索到某个查询图像的正确全球定位系统坐标。虽然最近的工程显示,包含语义和外观信息的建筑描述符是有益的,但目前最先进的方法选择了重要语义内容的自上而下的定义。我们在这里介绍第一个VPR算法,它从数据的视觉外观和语义内容中学习强大的全球嵌入,分解过程以多尺度关注模块对地点的识别为动态指导。关于各种设想的实验验证了这一新方法,并展示了它相对于最新方法的性能。最后,我们提出了第一个适合地点识别和分解任务的第一个合成世界数据集。