Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), especially when applied to the task of geometric feature matching. While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks. We leverage this insight together with explicit supervision at multiple levels of the feature hierarchy for better regularization, to learn more effective descriptors in the context of geometric matching tasks. Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks. We propose concrete CNN architectures employing these ideas, and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets.
 翻译:深神经网络的最近进步使得任务专用的有知识的描述器能够比手工制作的描述器在许多问题上表现得更好。我们证明,常用的标准化学习方法不能最佳地利用进化神经网络(CNN)中学习的特征分级,特别是在应用到几何特征匹配任务时。尽管适用于有线电视新闻网最深层的测量损失通常会产生理想的特征,而不管任务如何,事实上,不断增长的可接受字段和微量效应导致更浅的特征导致高精度匹配任务。我们利用这种洞察和在特性等级的多个层次进行明确监督以更好地规范化,在几何匹配任务中学习更有效的描述器。此外,我们提议使用有线电视新闻网不同层次的动画,作为用于匹配任务的多分辨率图像金字塔的有效和有原则的替代。我们提议采用这些概念的具体CNN结构,并评估在2D和3D几何几何几何数据设置上对齐,作为光学流。