In this paper, we aim at establishing accurate dense correspondences between a pair of images with overlapping field of view under challenging illumination variation, viewpoint changes, and style differences. Through an extensive ablation study of the state-of-the-art correspondence networks, we surprisingly discovered that the widely adopted 4D correlation tensor and its related learning and processing modules could be de-parameterised and removed from training with merely a minor impact over the final matching accuracy. Disabling these computational expensive modules dramatically speeds up the training procedure and allows to use 4 times bigger batch size, which in turn compensates for the accuracy drop. Together with a multi-GPU inference stage, our method facilitates the systematic investigation of the relationship between matching accuracy and up-sampling resolution of the native testing images from 1280 to 4K. This leads to discovery of the existence of an optimal resolution $\mathbb{X}$ that produces accurate matching performance surpassing the state-of-the-art methods particularly over the lower error band on public benchmarks for the proposed network.
翻译:在本文中,我们的目标是在极具挑战性的照明变异、观点变化和风格差异下,一对观点重叠的图像之间建立精确密集的对应关系。通过对最先进的通信网络进行广泛的对比研究,我们令人惊讶地发现,广泛采用的4D相关相向温度及其相关的学习和处理模块可以脱分和从培训中去除,仅对最终匹配准确性产生轻微影响。 禁用这些计算费用昂贵的模块会大大加快培训程序,并允许使用4倍以上的批量大小,从而弥补准确性下降。 我们的方法加上一个多重GPU的推断阶段,促进了对本地测试图像从1280至4K之间匹配准确性和高抽样分辨率之间的关系的系统调查。 这导致发现存在一个最佳分辨率$\mathb{X}$,其准确匹配性能超过最先进的方法,特别是超过拟议网络公共基准的较低误差段。