Establishing dense correspondences across semantically similar images is one of the challenging tasks due to the significant intra-class variations and background clutters. To solve these problems, numerous methods have been proposed, focused on learning feature extractor or cost aggregation independently, which yields sub-optimal performance. In this paper, we propose a novel framework for jointly learning feature extraction and cost aggregation for semantic correspondence. By exploiting the pseudo labels from each module, the networks consisting of feature extraction and cost aggregation modules are simultaneously learned in a boosting fashion. Moreover, to ignore unreliable pseudo labels, we present a confidence-aware contrastive loss function for learning the networks in a weakly-supervised manner. We demonstrate our competitive results on standard benchmarks for semantic correspondence.
翻译:在语义相似的图像中建立密集的通信是具有挑战性的任务之一,因为阶级内部差异和背景差异很大。为了解决这些问题,提出了许多方法,侧重于独立学习地物提取器或成本汇总,从而产生亚优性表现。在本文中,我们提出了一个新的框架,用于共同学习地物提取和语义通信成本汇总。通过利用每个模块的假标签,由地物提取和成本汇总模块组成的网络正在以加速的方式同时学习。此外,为了忽略不可靠的假标签,我们提出了一种信任感反差功能,用于以薄弱的监管方式学习网络。我们展示了我们在语义通信标准基准方面的竞争结果。