Visual place recognition is the task of recognizing a place depicted in an image based on its pure visual appearance without metadata. In visual place recognition, the challenges lie upon not only the changes in lighting conditions, camera viewpoint, and scale but also the characteristic of scene-level images and the distinct features of the area. To resolve these challenges, one must consider both the local discriminativeness and the global semantic context of images. On the other hand, the diversity of the datasets is also particularly important to develop more general models and advance the progress of the field. In this paper, we present a fully-automated system for place recognition at a city-scale based on content-based image retrieval. Our main contributions to the community lie in three aspects. Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task compared to general image retrieval tasks. Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector. Finally, we introduce new datasets for place recognition, which are particularly essential for application-based research. Furthermore, throughout extensive experiments, various issues in both image retrieval and place recognition are analyzed and discussed to give some insights into improving the performance of retrieval models in reality. The dataset used in this paper can be found at https://github.com/canhld94/Daejeon520
翻译:视觉场所识别是一项基于纯视觉外观而不涉及元数据的任务,其挑战不仅仅在于照明条件、相机视角和尺度的变化,还在于场景级图像的特征和区域的不同特点。为了解决这些挑战,在图像表示上必须考虑到局部区域的辨别度和全局文本语义上下文。另一方面,数据集的多样性也特别重要,以开发更通用的模型,促进该领域的进展。在本文中,我们提出了一种基于内容的图像检索的完全自动化的城市范围场所识别系统。我们对视觉场所识别进行了全面分析,勾勒出该任务相对于一般图像检索任务的独特挑战。其次,我们提出了一个简单的池化方法,利用卷积神经网络激活在顶部来将空间信息嵌入到图像表示向量中。最后,我们引入了新的场所识别数据集,这对应用性研究特别重要。此外,通过广泛的实验,分析和讨论了图像检索和场所识别中的各种问题,以提供一些改善检索模型在实际中的性能的见解。本文所使用的数据集可在https://github.com/canhld94/Daejeon520 找到。