Learning robust local image feature matching is a fundamental low-level vision task, which has been widely explored in the past few years. Recently, detector-free local feature matchers based on transformers have shown promising results, which largely outperform pure Convolutional Neural Network (CNN) based ones. But correlations produced by transformer-based methods are spatially limited to the center of source views' coarse patches, because of the costly attention learning. In this work, we rethink this issue and find that such matching formulation degrades pose estimation, especially for low-resolution images. So we propose a transformer-based cascade matching model -- Cascade feature Matching TRansformer (CasMTR), to efficiently learn dense feature correlations, which allows us to choose more reliable matching pairs for the relative pose estimation. Instead of re-training a new detector, we use a simple yet effective Non-Maximum Suppression (NMS) post-process to filter keypoints through the confidence map, and largely improve the matching precision. CasMTR achieves state-of-the-art performance in indoor and outdoor pose estimation as well as visual localization. Moreover, thorough ablations show the efficacy of the proposed components and techniques.
翻译:学习强力本地图像匹配是一项基本的低水平愿景任务,在过去几年中已经广泛探索了这一任务。最近,基于变压器的无探测器本地特征匹配器展示了令人乐观的结果,这些结果基本上优于纯粹的进化神经网络(CNN)基础。但基于变压器的方法所产生的相关关系在空间上仅限于源视图中心,因为关注程度学习成本很高,因此在源码粗糙的补丁上,我们使用简单而有效的非马克西姆制片后处理方法通过信任图过滤关键点,并在很大程度上改进匹配的精确度。因此,我们建议基于变压器的级联动配配模型 -- -- 星特征匹配TRansex(CasMTR),以便高效学习密度的特征相关性,从而使我们能够选择更可靠的匹配配对来进行相对的外观估计。我们没有再培训一个新的检测器,而是使用简单而有效的非马克西姆制后处理方法,通过信任图过滤关键点,并大大改进了匹配的精确度。CasMTR在室内和室外的图像配置中实现最精确性。</s>