Visual Place Recognition (VPR) in areas with similar scenes such as urban or indoor scenarios is a major challenge. Existing VPR methods using global descriptors have difficulty capturing local specific regions (LSR) in the scene and are therefore prone to localization confusion in such scenarios. As a result, finding the LSR that are critical for location recognition becomes key. To address this challenge, we introduced Patch-NetVLAD+, which was inspired by patch-based VPR researches. Our method proposed a fine-tuning strategy with triplet loss to make NetVLAD suitable for extracting patch-level descriptors. Moreover, unlike existing methods that treat all patches in an image equally, our method extracts patches of LSR, which present less frequently throughout the dataset, and makes them play an important role in VPR by assigning proper weights to them. Experiments on Pittsburgh30k and Tokyo247 datasets show that our approach achieved up to 6.35\% performance improvement than existing patch-based methods.
翻译:在城市或室内情景等类似场景的地区,如城市或室内情景中,视觉定位识别(VPR)是一个重大挑战。使用全球描述符的现有VPR方法很难在现场捕捉到当地特定区域,因此在这种场景中容易造成本地化混乱。因此,找到对于定位识别至关重要的LSR就成为关键。为了应对这一挑战,我们引入了受基于补丁的VPR研究启发的Patch-NetVLAD+。我们的方法提出了三重损失的微调策略,以使NetVLAD适合提取补丁描述符。此外,我们的方法与现有在图像中同等地处理所有补丁的方法不同,我们的方法提取了LSR的补丁,这些补丁在整个数据集中并不那么频繁地展示,并且通过给它们分配适当的权重,使它们在VPR中发挥重要作用。匹兹特30k和东京247数据集的实验表明,我们的方法比现有的补丁法提高了6.35 ⁇ 的绩效改进。