Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on multiple datasets covering a broad domain variety. To facilitate this, we introduce kapture, a new, flexible, unified data format and toolbox for visual localization and structure-from-motion (SFM). It enables easy usage of different datasets as well as efficient and reusable data processing. To demonstrate this, we present a versatile pipeline for visual localization that facilitates the use of different local and global features, 3D data (e.g. depth maps), non-vision sensor data (e.g. IMU, GPS, WiFi), and various processing algorithms. Using multiple configurations of the pipeline, we show the great versatility of kapture in our experiments. Furthermore, we evaluate our methods on eight public datasets where they rank top on all and first on many of them. To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license. github.com/naver/kapture, github.com/naver/kapture-localization
翻译:通过对查询图像和地图进行对等分析,从图像中估算相机的构成。这项任务是计算和数据密集,对彻底评估各种数据集的方法提出了挑战。然而,为了进一步推进实地工作,我们声称,应在涵盖广泛领域多样性的多个数据集中评估稳健的视觉本地化算法;为了便利这项工作,我们引入了卡普图、一个新的、灵活、统一的数据格式和工具箱,用于视觉本地化和结构自动(SFM),便于使用不同的数据集以及高效和可再利用的数据处理。为了证明这一点,我们为视觉本地化提供了一个多功能管道,便于使用不同的本地和全球特征、3D数据(例如深度地图)、非视觉本地化算法(例如IMU、GPS、WiFi),以及各种处理算法。我们利用管道的多个配置,我们展示了Kapture的多功能性。此外,我们评估了我们在八个公共数据集上采用的方法,这些数据集位于全部位置上,以及首先位于许多地方级的图像中。促进未来研究(如深度地图地图),在开源/开版格式下,我们释放所有数据/SDSD/SDSDA/SDA中所使用的所有版本。