Visual Place Recognition is an essential component of systems for camera localization and loop closure detection, and it has attracted widespread interest in multiple domains such as computer vision, robotics and AR/VR. In this work, we propose a faster, lighter and stronger approach that can generate models with fewer parameters and can spend less time in the inference stage. We designed RepVGG-lite as the backbone network in our architecture, it is more discriminative than other general networks in the Place Recognition task. RepVGG-lite has more speed advantages while achieving higher performance. We extract only one scale patch-level descriptors from global descriptors in the feature extraction stage. Then we design a trainable feature matcher to exploit both spatial relationships of the features and their visual appearance, which is based on the attention mechanism. Comprehensive experiments on challenging benchmark datasets demonstrate the proposed method outperforming recent other state-of-the-art learned approaches, and achieving even higher inference speed. Our system has 14 times less params than Patch-NetVLAD, 6.8 times lower theoretical FLOPs, and run faster 21 and 33 times in feature extraction and feature matching. Moreover, the performance of our approach is 0.5\% better than Patch-NetVLAD in Recall@1. We used subsets of Mapillary Street Level Sequences dataset to conduct experiments for all other challenging conditions.
翻译:视觉位置识别是相机本地化和环闭探测系统的一个基本组成部分。 它吸引了对计算机视觉、机器人和AR/VR等多个领域的广泛兴趣。 在此工作中,我们提出一个更快、更轻、更强的方法,可以产生参数较少的模型,在推断阶段花费的时间较少。 我们设计了REVGG-Lite作为我们架构的主干网络,它比在确认地点任务中的其他一般网络更具歧视性,它比承认地点任务中的其他一般网络更具有歧视性。 RepVGG-Lite在取得更高性能的同时,具有更高的速度优势。我们在特征提取阶段只从全球描述器中提取一个规模的补丁级描述器。然后我们设计一个可训练的特征匹配器,以利用这些特征的空间关系及其视觉外观,该匹配以关注机制为基础。关于具有挑战性的基准数据集的全面实验表明,拟议方法的性能优于我们最近所学到的其他最先进方法,而且更具有更高的推断速度。 我们的系统比Patch-NetLAD, 6.倍于理论级FLOPs, 在特征提取和特征特征特征特征特征图像匹配中, 0.5的SLADADA级方法的运行比其他条件要好14倍。