In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image warping correspondence loss for both feature detection and matching, a weakly-supervised epipolar constraints loss on relative camera pose learning, and a directional matching scheme that detects key-point features in a source image and performs coarse-to-fine correspondence search on the target image. We leverage this framework to enforce cycle consistency in our matching module. In addition, we propose a new loss to robustly handle both definite inlier/outlier matches and less-certain matches. The integration of these learning mechanisms enables end-to-end training of a single network performing all three localization components. Bench-marking our approach on public data-sets, exemplifies how such an end-to-end framework is able to yield more accurate localization that out-performs both traditional methods as well as state-of-the-art weakly supervised methods.
翻译:在本文中,我们提出一个端到端框架,共同学习基于图像的 3D 本地化 3D 任务的关键点检测、描述代表和跨框架匹配。 先前的艺术单独解决了其中的每一个组成部分,据称旨在缓解有效培训整体网络的困难。 我们设计了一个自我监督的图像扭曲通信损失以探测和匹配特征,一个对相对相机的上层约束损失监管不力的上下层显示学习,以及一个方向匹配计划,检测源图像中的关键点特征,对目标图像进行粗略到平面的对等搜索。 我们利用这一框架在匹配模块中实施周期一致性。 此外,我们提出了一个新的损失,以强有力地处理明确的内/外层匹配和不太可靠的匹配。 这些学习机制的整合使得能够对一个单一网络进行端到端培训,其中包含所有三个本地化组成部分。 在公共数据设置上标注我们的方法,展示这种端到端框架如何能够产生更准确的本地化,而超越常规方法,作为州监督的一种方法。