Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.
翻译:在GPS被拒绝的环境中,机器人本地化仍然是一项具有挑战性的任务。基于地方传感器的国家估算方法,如照相机或集成电路,随着误差的积累,对远程飞行任务容易漂移。在本研究中,我们的目标是通过在2D多模式地理地图中将图像观测定位来解决这一问题。我们引入了跨比例数据集和从跨模式来源产生额外数据的方法。我们提议了一个框架,在不受监督的情况下学习跨比例的视觉表现。对水下和空中两个不同领域的数据进行了实验。与交叉视图图像地理定位的现有研究相比,我们的方法a) 更好地使用小规模多模式地图;b) 更符合实时应用的计算效率;c) 能够与州际估算管道直接配合使用。