Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale. Most existing approaches focus on designing various matching approaches with fully-supervised ImageNet pretrained networks. On the other hand, while a variety of self-supervised approaches are proposed to explicitly measure image-level similarities, correspondence matching the pixel level remains under-explored. In this work, we propose a multi-level contrastive learning approach for semantic matching, which does not rely on any ImageNet pretrained model. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects, while the performance can be further enhanced by regularizing cross-instance cycle-consistency at intermediate feature levels. Experimental results on the PF-PASCAL, PF-WILLOW, and SPair-71k benchmark datasets demonstrate that our method performs favorably against the state-of-the-art approaches. The source code and trained models will be made available to the public.
翻译:与地震有关的图像的强烈对应性已得到广泛研究,但仍面临两个挑战:(1) 外观、比例和面貌的差别很大,即使是同一类别的物体也是如此;(2) 标签像素级密集的对应性是劳动密集和无法规模化的。大多数现有方法侧重于设计与完全监督的图像网预设网络的匹配方法。另一方面,虽然提出了各种自我监督的方法,以明确测量图像级的相似性,但与像素级相匹配的对应的对应性仍然未得到充分开发。在这项工作中,我们建议对语义匹配采用多层次的对比性学习方法,不依赖任何图像网预设模型。我们表明,图像级对比性学习是一个关键组成部分,可以鼓励进化特征在类似对象之间找到对应性,同时通过在中间特征级实现跨 Incent 周期一致性的正规化,可以进一步提高性能。PF-PASACAL、PF-WILOLW和SPair-71k基准数据集的实验结果显示,我们所采用的方法将优于现有的公共代码源。