We introduce Flatlandia, a novel problem for visual localization of an image from object detections composed of two specific tasks: i) Coarse Map Localization: localizing a single image observing a set of objects in respect to a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimating latitude, longitude, and orientation of the image within a 2D map. Solutions for these new tasks exploit the wide availability of open urban maps annotated with GPS locations of common objects (\eg via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we provide the Flatlandia dataset, designed for 3DoF visual localization in multiple urban settings and based on crowd-sourced data from five European cities. We use the Flatlandia dataset to validate the complexity of the proposed tasks.
翻译:我们介绍了 Flatlandia,这是一个新问题,用于从由两个特定任务组成的对象检测中找到图像的视觉定位:i)粗略地在物体地标的 2D 地图上定位观察到的单个图像;ii)细粒度的 3DoF 定位:估计图像在 2D 地图内的纬度、经度和方向。这些新任务的解决方案利用了开放的城市地图,这些地图用 GPS 定位常见对象的位置标注(例如通过测量或众包)。与通常用于视觉定位的大规模 3D 模型相比,这些地图还更加存储友好,而且还保护隐私。由于现有数据集不适用于所提出的问题,因此我们提供了 Flatlandia 数据集,旨在在多个欧洲城市的城市环境中进行 3DoF 视觉定位,并基于从众包数据中获得。我们使用 Flatlandia 数据集来验证所提出任务的复杂程度。