Establishing a sparse set of keypoint correspon dences between images is a fundamental task in many computer vision pipelines. Often, this translates into a computationally expensive nearest neighbor search, where every keypoint descriptor at one image must be compared with all the descriptors at the others. In order to lower the computational cost of the matching phase, we propose a deep feature extraction network capable of detecting a predefined number of complementary sets of keypoints at each image. Since only the descriptors within the same set need to be compared across the different images, the matching phase computational complexity decreases with the number of sets. We train our network to predict the keypoints and compute the corresponding descriptors jointly. In particular, in order to learn complementary sets of keypoints, we introduce a novel unsupervised loss which penalizes intersections among the different sets. Additionally, we propose a novel descriptor-based weighting scheme meant to penalize the detection of keypoints with non-discriminative descriptors. With extensive experiments we show that our feature extraction network, trained only on synthetically warped images and in a fully unsupervised manner, achieves competitive results on 3D reconstruction and re-localization tasks at a reduced matching complexity.
翻译:在许多计算机视觉管道中,在图像之间建立一组稀少的关键点可反射线是一个基本任务。 通常, 这会转化为计算成本昂贵的近邻搜索, 在一个图像中, 每个关键点描述符都必须与所有其他描述符进行比较。 为了降低匹配阶段的计算成本, 我们建议建立一个深点提取网络, 能够检测每张图像上一组预定数量的补充关键点。 由于同一数据集中只有描述符需要在不同图像之间进行比较, 匹配的计算复杂性会随着集数的减少而下降。 我们训练我们的网络来预测关键点, 并共同计算相应的描述符。 特别是, 为了学习相补的关键点, 我们引入了一个新的非超常损失, 从而惩罚不同组合之间的交叉。 此外, 我们提出了一个新的基于描述符的加权计划, 旨在惩罚用非偏差的描述符检测关键点, 我们的广泛实验显示, 我们的特征提取网络, 仅对合成战争图像进行了培训, 并且以完全超强的复杂度进行重组, 并且以完全超超强的复杂度的方式实现了 。