Two-view correspondence learning is a key task in computer vision, which aims to establish reliable matching relationships for applications such as camera pose estimation and 3D reconstruction. However, existing methods have limitations in local geometric modeling and cross-stage information optimization, which make it difficult to accurately capture the geometric constraints of matched pairs and thus reduce the robustness of the model. To address these challenges, we propose a Multi-Graph Contextual Attention Network (MGCA-Net), which consists of a Contextual Geometric Attention (CGA) module and a Cross-Stage Multi-Graph Consensus (CSMGC) module. Specifically, CGA dynamically integrates spatial position and feature information via an adaptive attention mechanism and enhances the capability to capture both local and global geometric relationships. Meanwhile, CSMGC establishes geometric consensus via a cross-stage sparse graph network, ensuring the consistency of geometric information across different stages. Experimental results on two representative YFCC100M and SUN3D datasets show that MGCA-Net significantly outperforms existing SOTA methods in the outlier rejection and camera pose estimation tasks. Source code is available at http://www.linshuyuan.com.
翻译:双视图对应学习是计算机视觉中的一项关键任务,旨在为相机姿态估计和三维重建等应用建立可靠的匹配关系。然而,现有方法在局部几何建模和跨阶段信息优化方面存在局限,难以准确捕捉匹配对的几何约束,从而降低了模型的鲁棒性。为应对这些挑战,我们提出了一种多图上下文注意力网络(MGCA-Net),它由上下文几何注意力(CGA)模块和跨阶段多图共识(CSMGC)模块组成。具体而言,CGA通过自适应注意力机制动态整合空间位置与特征信息,并增强了对局部与全局几何关系的捕捉能力。同时,CSMGC通过跨阶段稀疏图网络建立几何共识,确保了不同阶段间几何信息的一致性。在两个代表性数据集YFCC100M和SUN3D上的实验结果表明,MGCA-Net在离群点剔除和相机姿态估计任务上显著优于现有的SOTA方法。源代码可在 http://www.linshuyuan.com 获取。