Recent research has focused on using convolutional neural networks (CNNs) as the backbones in two-view correspondence learning, demonstrating significant superiority over methods based on multilayer perceptrons. However, CNN backbones that are not tailored to specific tasks may fail to effectively aggregate global context and oversmooth dense motion fields in scenes with large disparity. To address these problems, we propose a novel network named SC-Net, which effectively integrates bilateral context from both spatial and channel perspectives. Specifically, we design an adaptive focused regularization module (AFR) to enhance the model's position-awareness and robustness against spurious motion samples, thereby facilitating the generation of a more accurate motion field. We then propose a bilateral field adjustment module (BFA) to refine the motion field by simultaneously modeling long-range relationships and facilitating interaction across spatial and channel dimensions. Finally, we recover the motion vectors from the refined field using a position-aware recovery module (PAR) that ensures consistency and precision. Extensive experiments demonstrate that SC-Net outperforms state-of-the-art methods in relative pose estimation and outlier removal tasks on YFCC100M and SUN3D datasets. Source code is available at http://www.linshuyuan.com.
翻译:近期研究聚焦于使用卷积神经网络(CNN)作为双视图对应关系学习的骨干网络,相较于基于多层感知机的方法展现出显著优势。然而,未经特定任务定制的CNN骨干网络可能无法有效聚合全局上下文,并在视差较大的场景中对稠密运动场产生过度平滑效应。为解决这些问题,我们提出了一种名为SC-Net的新型网络,该网络能够从空间和通道两个视角有效整合双边上下文信息。具体而言,我们设计了一个自适应聚焦正则化模块(AFR),以增强模型的位置感知能力以及对虚假运动样本的鲁棒性,从而促进生成更精确的运动场。随后,我们提出了一个双边场调整模块(BFA),通过同时建模长程关系并促进空间与通道维度间的交互,来优化运动场。最后,我们利用一个确保一致性与精度的位置感知恢复模块(PAR),从优化后的场中恢复运动向量。大量实验表明,在YFCC100M和SUN3D数据集上的相对姿态估计与异常值剔除任务中,SC-Net均优于现有先进方法。源代码发布于 http://www.linshuyuan.com。