Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, \ie, CLIP, to compensate for insufficient annotations. In spite of promising performance, they generally overlook the valuable prior about the label-to-label correspondence. In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. We then present a novel Semantic Correspondence Prompt Network (SCPNet), which can thoroughly explore the structured semantic prior. A Prior-Enhanced Self-Supervised Learning method is further introduced to enhance the use of the prior. Comprehensive experiments and analyses on several widely used benchmark datasets show that our method significantly outperforms existing methods on all datasets, well demonstrating the effectiveness and the superiority of our method. Our code will be available at https://github.com/jameslahm/SCPNet.
翻译:随着图像和视觉语言模型,如CLIP,的出现,最近的研究努力探索图像与标签之间的关联来对不完整标签进行补偿。尽管表现出了很好的性能,但它们往往忽略了关于标签之间关联的宝贵先验知识。在本文中,我们通过使用语义先验提示器推导出关于标签之间关联的结构化语义先验,以弥补不完整标签带来的标签监督缺失的不足。然后,我们提出了一个新颖的语义对应提示网络(SCPNet),它可以彻底探索结构化的语义先验。进一步引入了一种增强先验应用的自监督学习方法。在几个广泛使用的基准数据集上进行的全面实验和分析表明,我们的方法在所有数据集上都明显优于现有方法,充分证明了我们方法的有效性和优越性。我们的代码将在https://github.com/jameslahm/SCPNet上提供。