We propose a new method for estimating the relative pose between two images, where we jointly learn keypoint detection, description extraction, matching and robust pose estimation. While our architecture follows the traditional pipeline for pose estimation from geometric computer vision, all steps are learnt in an end-to-end fashion, including feature matching. We demonstrate our method for the task of visual localization of a query image within a database of images with known pose. Pairwise pose estimation has many practical applications for robotic mapping, navigation, and AR. For example, the display of persistent AR objects in the scene relies on a precise camera localization to make the digital models appear anchored to the physical environment. We train our pipeline end-to-end specifically for the problem of visual localization. We evaluate our proposed approach on localization accuracy, robustness and runtime speed. Our method achieves state of the art localization accuracy on the 7 Scenes dataset.
翻译:我们提出了一种新的方法来估计两种图像之间的相对构成,我们在那里共同学习关键点探测、描述提取、匹配和稳健的构成估计。虽然我们的建筑遵循传统的管道,以便从几何计算机愿景中做出估计,但所有步骤都是以端到端的方式学习的,包括特征匹配。我们展示了在已知面貌的图像数据库中将查询图像视觉化的任务的方法。对立的图像估算有许多机器人绘图、导航和AR的实用应用。例如,在现场展示持久性AR物体需要精确的相机定位,才能使数字模型以物理环境为定位。我们专门就视觉定位问题对管道端到端进行培训。我们评估了我们关于本地化准确性、稳健性和运行速度的拟议方法。我们的方法在7个场景数据集上实现了艺术本地化精度状态。