The large variation of viewpoint and irrelevant content around the target always hinder accurate image retrieval and its subsequent tasks. In this paper, we investigate an extremely challenging task: given a ground-view image of a landmark, we aim to achieve cross-view geo-localization by searching out its corresponding satellite-view images. Specifically, the challenge comes from the gap between ground-view and satellite-view, which includes not only large viewpoint changes (some parts of the landmark may be invisible from front view to top view) but also highly irrelevant background (the target landmark tend to be hidden in other surrounding buildings), making it difficult to learn a common representation or a suitable mapping. To address this issue, we take advantage of drone-view information as a bridge between ground-view and satellite-view domains. We propose a Peer Learning and Cross Diffusion (PLCD) framework. PLCD consists of three parts: 1) a peer learning across ground-view and drone-view to find visible parts to benefit ground-drone cross-view representation learning; 2) a patch-based network for satellite-drone cross-view representation learning; 3) a cross diffusion between ground-drone space and satellite-drone space. Extensive experiments conducted on the University-Earth and University-Google datasets show that our method outperforms state-of-the-arts significantly.
翻译:在本文中,我们调查了一个极具挑战性的任务:鉴于一个里程碑的地观图像,我们的目标是通过搜索相应的卫星视图图像实现交叉视图地理定位。具体地说,挑战来自地观与卫星视图之间的差距,不仅包括大视角变化(从前视到顶视,有些地标部分可能隐形),而且包括高度无关的背景(目标里程碑往往隐藏在其他周围建筑物中),因此难以学习共同代表或适当的绘图。为了解决这一问题,我们利用无人机视图信息作为地观与卫星视图域之间的桥梁。我们提议了一个同行学习和交叉驱动(PLCD)框架。 PLCD由三个部分组成:1) 通过对地观和无人驾驶视图进行同行学习,以找到可见的部分,从而有利于地标交叉视角的学习;2) 以卫星平台为基的跨视图学习网络,从而难以了解共同的代表或适当的绘图。3)为解决这一问题,我们利用无人机视图信息作为地标空间与卫星观测域之间的桥梁。我们提议了一个同行学习和交叉定位框架。 PLCD由三个部分组成,1)通过对地观与地球大学进行大规模空间实验。