Recently, to improve the unsupervised image retrieval performance, plenty of unsupervised hashing methods have been proposed by designing a semantic similarity matrix, which is based on the similarities between image features extracted by a pre-trained CNN model. However, most of these methods tend to ignore high-level abstract semantic concepts contained in images. Intuitively, concepts play an important role in calculating the similarity among images. In real-world scenarios, each image is associated with some concepts, and the similarity between two images will be larger if they share more identical concepts. Inspired by the above intuition, in this work, we propose a novel Unsupervised Hashing with Semantic Concept Mining, called UHSCM, which leverages a VLP model to construct a high-quality similarity matrix. Specifically, a set of randomly chosen concepts is first collected. Then, by employing a vision-language pretraining (VLP) model with the prompt engineering which has shown strong power in visual representation learning, the set of concepts is denoised according to the training images. Next, the proposed method UHSCM applies the VLP model with prompting again to mine the concept distribution of each image and construct a high-quality semantic similarity matrix based on the mined concept distributions. Finally, with the semantic similarity matrix as guiding information, a novel hashing loss with a modified contrastive loss based regularization item is proposed to optimize the hashing network. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.
翻译:最近,为了改进未经监督的图像检索性能,通过设计一个语义相似性矩阵,提出了大量未经监督的散列方法,该矩阵基于通过受过训练的CNN模型提取的图像特征之间的相似性。然而,大多数这些方法倾向于忽略图像中包含的高层次抽象语义概念。直觉中,概念在计算图像相似性方面起着重要作用。在现实世界的情景中,每个图像都与某些概念相关联,如果两个图像共享更相似的概念,它们之间的相似性就会更大。在以上直觉的启发下,我们提出了一个新的与Semanitic概念采矿公司(UHashing)的超超常性哈斯兴相似性矩阵。然而,我们建议采用VLP模型,将一组随机选择的概念在计算图像相近性图像的预选模型(VLP)模型与快速工程模型(在视觉演示中显示最强的精锐性精锐性精锐性),一套概念根据培训图像被淡化。接下来,拟议的方法在SEMM 基线 Mister Revoriz 中将高质量的图像模型模型(SLM) 显示一个类似的图像流化模型,最终将一组数据流 显示一个基于高质量的模型。