Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.
翻译:关于学习本地特征的大多数现有研究侧重于对单个关键点的补丁描述,而忽视了从其关键点位置建立的空间关系。在本文件中,我们超越了当地的详细表述,引入了背景意识,以提升现成本地特征描述符。具体地说,我们提议了一个统一学习框架,利用和汇总跨模式背景信息,包括(一) 高层次图像显示的视觉背景,和(二) 2D 关键点分布的几何背景。此外,我们提议了一种有效的N-pair损失,避免了经验性超参数搜索,并改进了趋同。拟议的增强计划与原始本地特征描述相比是轻巧的,同时显著改进了多种场景的大型基准,这些基准显示了在几何匹配应用中的强大实用性和一般化能力。