Recently, the methods based on Convolutional Neural Networks (CNNs) have gained popularity in the field of visual place recognition (VPR). In particular, the features from the middle layers of CNNs are more robust to drastic appearance changes than handcrafted features and high-layer features. Unfortunately, the holistic mid-layer features lack robustness to large viewpoint changes. Here we split the holistic mid-layer features into local features, and propose an adaptive dynamic time warping (DTW) algorithm to align local features from the spatial domain while measuring the distance between two images. This realizes viewpoint-invariant and condition-invariant place recognition. Meanwhile, a local matching DTW (LM-DTW) algorithm is applied to perform image sequence matching based on temporal alignment, which achieves further improvements and ensures linear time complexity. We perform extensive experiments on five representative VPR datasets. The results show that the proposed method significantly improves the CNN-based methods. Moreover, our method outperforms several state-of-the-art methods while maintaining good run-time performance. This work provides a novel way to boost the performance of CNN methods without any re-training for VPR. The code is available at https://github.com/Lu-Feng/STA-VPR.
翻译:最近,以进化神经网络为基础的方法在视觉地点识别(VPR)领域越来越受欢迎。特别是,中层CNN的特征比手工制作的特征和高层特征更强大,更像外观变化。不幸的是,整体的中层特征缺乏对大视野变化的强力。在这里,我们将整体的中层特征分成了本地特征,并提出了一个适应性动态时间扭曲算法(DTW),以在测量两个图像之间的距离时将本地特征与空间域相匹配。这实现了视觉-变量和条件-异性位置识别。与此同时,一个本地匹配的DW(LM-DTW)算法(LM-DTW)比手工制作的特征更强,更能进行急剧的外观变化。不幸的是,整体的中层特征缺乏对大视野变化的强力。我们在五个具有代表性的VPR数据集上进行了广泛的实验。结果显示,拟议的方法大大改进了CNNMN的方法。此外,我们的方法在保持良好的运行时间性能的同时,超越了几种状态-艺术方法。这项工作为增强CNNWM-PR(MAST/L)/Revu 任何再培训的系统。