Supervised deep learning-based hash and vector quantization are enabling fast and large-scale image retrieval systems. By fully exploiting label annotations, they are achieving outstanding retrieval performances compared to the conventional methods. However, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, we propose the first deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner. We design a Cross Quantized Contrastive learning strategy that jointly learns codewords and deep visual descriptors by comparing individually transformed images (views). Our method analyzes the image contents to extract descriptive features, allowing us to understand image representations for accurate retrieval. By conducting extensive experiments on benchmarks, we demonstrate that the proposed method yields state-of-the-art results even without supervised pretraining.
翻译:为了解决这些问题,我们建议了第一种未经监督的深度图像检索方法,即所谓的自我监督产品量化(SPQ)网络,这是无标签的,以自我监督的方式进行培训。我们设计了一个交叉量化对比学习战略,通过比较个别变化的图像(视图),共同学习代码词和深视描述符。我们的方法分析图像内容以提取描述性能,使我们能够理解图像的表达方式,以便准确检索。我们通过在基准上进行广泛的实验,我们证明,拟议的方法即使没有经过监督的预先培训,也能产生最新的艺术成果。