Finding correspondences between images or 3D scans is at the heart of many computer vision and image retrieval applications and is often enabled by matching local keypoint descriptors. Various learning approaches have been applied in the past to different stages of the matching pipeline, considering detector, descriptor, or metric learning objectives. These objectives were typically addressed separately and most previous work has focused on image data. This paper proposes an end-to-end learning framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans, where the two can be jointly optimized towards task-specific objectives without a need for separate annotations. We employ a Siamese architecture augmented by a sampling layer and a novel score loss function which in turn affects the selection of region proposals. The positive and negative examples are obtained automatically by sampling corresponding region proposals based on their consistency with known 3D pose labels. Matching experiments with depth data on multiple benchmark datasets demonstrate the efficacy of the proposed approach, showing significant improvements over state-of-the-art methods.
翻译:许多计算机视觉和图像检索应用程序的核心是图像或3D扫描之间的对应查找,这些应用往往通过匹配本地关键点描述符而得以实现。过去,考虑到探测器、描述符或衡量学习目标,对匹配管道的不同阶段应用了各种学习方法。这些目标通常单独处理,而以前的大部分工作侧重于图像数据。本文件建议为3D深度地图或3D扫描提供一个关键点探测及其代表(描述符)的端对端学习框架,两个框架可以联合优化,以实现具体任务的目标,而不需要单独说明。我们采用了一个通过取样层增强的Siames结构,以及一个新的分数损失函数,这反过来影响到区域提案的选择。根据相应区域提案与已知的3D构成标签的一致性进行抽样,从而自动获得正面和负面实例。将试验与多个基准数据集的深度数据匹配,表明拟议方法的功效,显示相对于最新方法的显著改进。