Hash coding has been widely used in approximate nearest neighbor search for large-scale image retrieval. Given semantic annotations such as class labels and pairwise similarities of the training data, hashing methods can learn and generate effective and compact binary codes. While some newly introduced images may contain undefined semantic labels, which we call unseen images, zeor-shot hashing techniques have been studied. However, existing zeor-shot hashing methods focus on the retrieval of single-label images, and cannot handle multi-label images. In this paper, for the first time, a novel transductive zero-shot hashing method is proposed for multi-label unseen image retrieval. In order to predict the labels of the unseen/target data, a visual-semantic bridge is built via instance-concept coherence ranking on the seen/source data. Then, pairwise similarity loss and focal quantization loss are constructed for training a hashing model using both the seen/source and unseen/target data. Extensive evaluations on three popular multi-label datasets demonstrate that, the proposed hashing method achieves significantly better results than the competing methods.
翻译:混凝土编码已被广泛用于近邻的大规模图像检索近距离搜索中。 鉴于类标签等语义说明以及培训数据的对称相似性, 散列方法可以学习和生成有效和紧凑的二进制代码。 虽然一些新引入的图像可能包含未定义的语义标签, 我们称之为看不见的图像, 但是已经研究了 zeor shot 散列技术。 但是, 现有的 zeor shot 散射法侧重于单标签图像的检索, 并且无法处理多标签图像 。 在本文中, 首次为多标签的不可见图像检索建议了一个新型的转导零发散射法。 为了预测可见/ 目标数据的标签标签, 视觉- 语义桥梁是通过对被观察/ 源数据进行实例感知一致性排序而建立的。 然后, 利用 可见/ 源 和 看不见/ 目标 数据 培训一个 散列模型, 将 相近性 损失 和 焦点 解析 损失 。 对三种流行的多标签数据集的广泛评估显示, 拟议的方法比竞相结果要好得多。