Finding dense semantic correspondence is a fundamental problem in computer vision, which remains challenging in complex scenes due to background clutter, extreme intra-class variation, and a severe lack of ground truth. In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations. To this end, we first propose a teacher-student learning paradigm for generating dense pseudo-labels and then develop two novel strategies for denoising pseudo-labels. In particular, we use spatial priors around the sparse annotations to suppress the noisy pseudo-labels. In addition, we introduce a loss-driven dynamic label selection strategy for label denoising. We instantiate our paradigm with two variants of learning strategies: a single offline teacher setting, and mutual online teachers setting. Our approach achieves notable improvements on three challenging benchmarks for semantic correspondence and establishes the new state-of-the-art. Project page: https://shuaiyihuang.github.io/publications/SCorrSAN.
翻译:查找密集的语义通信是计算机视觉中的一个基本问题,由于背景混乱、阶级内部差异极大以及严重缺乏地面真相,在复杂的场景中仍然具有挑战性。在本文中,我们的目标是通过从稀疏的关键点说明中丰富监督信号,应对语义通信标签的宽度挑战。为此,我们首先提出教师-学生学习模式,用于生成密度大的伪标签,然后制定两种新颖的取消伪标签的战略。特别是,我们使用稀疏说明周围的空间前缀来压制吵闹的伪标签。此外,我们引入了一种由损失驱动的动态标签选择策略,用于取消标签。我们用两种不同的学习策略来回旋我们的范式:单一的离线教师设置和相互的在线教师设置。我们的方法在三种具有挑战性的语义通信基准上取得了显著的改进,并建立了新的状态。项目网页:https://huaiyihiihuanguang.github.io/publicationationations/SCorrSAN。