Conventional techniques to establish dense correspondences across visually or semantically similar images focused on designing a task-specific matching prior, which is difficult to model. To overcome this, recent learning-based methods have attempted to learn a good matching prior within a model itself on large training data. The performance improvement was apparent, but the need for sufficient training data and intensive learning hinders their applicability. Moreover, using the fixed model at test time does not account for the fact that a pair of images may require their own prior, thus providing limited performance and poor generalization to unseen images. In this paper, we show that an image pair-specific prior can be captured by solely optimizing the untrained matching networks on an input pair of images. Tailored for such test-time optimization for dense correspondence, we present a residual matching network and a confidence-aware contrastive loss to guarantee a meaningful convergence. Experiments demonstrate that our framework, dubbed Deep Matching Prior (DMP), is competitive, or even outperforms, against the latest learning-based methods on several benchmarks for geometric matching and semantic matching, even though it requires neither large training data nor intensive learning. With the networks pre-trained, DMP attains state-of-the-art performance on all benchmarks.
翻译:在视觉或语义相似的图像中建立密集对应的常规技术,这些常规技术侧重于先设计具体任务比对,这是很难建模的。为了克服这一点,最近的基于学习的方法试图在模型本身本身的大型培训数据中先在大型培训数据中学习匹配。绩效改进是显而易见的,但充分培训数据和密集学习的必要性妨碍了其适用性。此外,在测试时使用固定模型并不能说明一副图像可能需要自己之前的图像,从而提供有限的性能和对不可见图像的概括性差。在本文中,我们显示,仅通过优化一组输入图像上未经训练的匹配网络,就可以捕捉到一个特定图像比对。为了对密集通信进行这种测试-时间优化而定制,我们提出了一个剩余匹配网络和信心-认知的对比性损失以保证有意义的趋同性。实验表明,我们的框架,称为深相匹配前(DMP)的框架,具有竞争力,甚至不符合最新基于学习的几级匹配和语义匹配方法,尽管不需要大量培训数据,也不需要密集学习。