Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at https://github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.
翻译:查找在多个视图中可以重复出现的本地特征是稀疏 3D 重建的基石。 古典图像匹配模式一劳永逸地检测了每个图像中的关键点, 这可以产生位置化的特征, 并将大错误传播到最终几何中。 在本文中, 我们通过直接对来自多个视图的低层次图像信息进行直接对齐, 来完善结构从移动的两个关键步骤 : 我们首先在任何几何估计之前调整初始关键点位置, 然后将点和相机作为后处理形式进行精细化 。 这种精细化对于大型的探测噪音和外观变化非常有力, 因为它优化了基于神经网络预测的密度特征的特征的特征差错。 这极大地提高了一系列广泛的关键点探测器的相机配置和场景几何性准确性, 挑战了查看条件, 以及场外的深处特征 。 我们的系统可以轻松地对大型图像收藏进行缩放, 使像素- perfect 群落源本地化成为规模的本地化。 我们的代码可以在 https://github. com/ cvg/ pixel-perfect- sect- sfect- sfect- sfmmmmmmmmmmmmus。