Visual place recognition (VPR) is usually considered as a specific image retrieval problem. Limited by existing training frameworks, most deep learning-based works cannot extract sufficiently stable global features from RGB images and rely on a time-consuming re-ranking step to exploit spatial structural information for better performance. In this paper, we propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features and thus improve feature stability in a constantly changing environment. Specifically, StructVPR uses segmentation images as a more definitive source of structural knowledge input into a CNN network and applies knowledge distillation to avoid online segmentation and inference of seg-branch in testing. Considering that not all samples contain high-quality and helpful knowledge, and some even hurt the performance of distillation, we partition samples and weigh each sample's distillation loss to enhance the expected knowledge precisely. Finally, StructVPR achieves impressive performance on several benchmarks using only global retrieval and even outperforms many two-stage approaches by a large margin. After adding additional re-ranking, ours achieves state-of-the-art performance while maintaining a low computational cost.
翻译:在现有培训框架的限制下,大多数深层次的学习基础工程无法从 RGB 图像中提取足够稳定的全球特征,而是依赖耗时的重排步骤来利用空间结构信息提高性能。 在本文中,我们提议SstructVPR,这是VPR的一个新培训架构,目的是提高RGB全球特征的结构性知识,从而在不断变化的环境中提高特征稳定性。具体地说, StructVPR 将分割图像作为结构知识输入CNN网络的更明确来源,并应用知识蒸馏来避免测试中系合的在线分割和推断。考虑到并非所有样本都包含高质量和有用的知识,有些甚至伤害了蒸馏的性能,我们分配样本,并权衡每个样本的蒸馏损失,以便准确地提高预期的知识。最后, StructVPR 在若干基准上取得了令人印象深刻的成绩,仅使用全球检索,甚至以大幅度超过许多两阶段方法。在增加重新排序后,我们实现了低成本的状态计算。