Precise estimation of global orientation and location is critical to ensure a compelling outdoor Augmented Reality (AR) experience. We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database. Recently, neural network-based methods have shown state-of-the-art performance in cross-view matching. However, most of the prior works focus only on location estimation, ignoring orientation, which cannot meet the requirements in outdoor AR applications. We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation. Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance. Furthermore, we present an approach to extend the single image query-based geo-localization approach by utilizing temporal information from a navigation pipeline for robust continuous geo-localization. Experimentation on several large-scale real-world video sequences demonstrates that our approach enables high-precision and stable AR insertion.
翻译:精确估计全局方向和位置对于确保引人入胜的室外增强现实体验至关重要。我们通过将查询地面图像与具有地理参考的卫星航拍图像数据库进行交叉匹配来解决地理姿态估计问题。最近,基于神经网络的方法在交叉视图匹配方面展现了最先进的性能。然而,大多数先前的工作只关注位置估计,而忽略了方向,这无法满足室外 AR 应用的要求。我们提出了一种基于变换器神经网络模型和修改的三元组排名损失的新模型,用于联合位置和方向估计。在几个基准交叉视图地理定位数据集上的实验表明,我们的模型实现了最先进的性能。此外,我们提出了一种通过利用导航管道的时间信息来扩展单个图像查询的地理定位方法,以实现稳健的持续地理定位。在几个大规模真实世界视频序列的实验中,我们的方法实现了高精度、稳定的 AR 插入。