We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution -- the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
翻译:我们提出自我监督的几何感知(SGP),这是第一个在没有任何地面真实性几何模型标签的情况下学习对应对应对应对应的特征描述符(例如,照相、硬质变换)的总框架。我们的第一个贡献是将几何感识形成一个优化问题,共同优化特征描述仪和几何模型,并配以大量视觉测量(例如,图像、点云),在这种优化的配制下,我们展示了两种重要的视觉研究流,即强力模型安装和深度特征学习,在修补另一块时,与优化一组未知变量相对一致。这一分析自然导致我们的第二个贡献 -- -- 以交替最小化方式解决联合优化的SGGP算法。 SGP反复执行两种元性格:一位教师,使用强力模型,将学到的特性安装成几何伪标签,以及一位学生,在对假标签进行噪音监管下进行深度特征学习。作为第三项贡献,我们将SGP应用于大型真实数据集的两个认知问题,即使用相对相机进行交替最小最小化的最小化的SDGMDA或高压记录。