Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover, the results can suffer from the robustness problem as the data at training and test stage could come from similar but different distributions. This paper studies the exploration of generating synthetic data through semi-supervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained end-to-end by jointly optimizing three losses, i.e., adversarial loss to correct label of synthetic or real for each sample, triplet ranking loss to preserve the relative similarity ordering in the input real-synthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR-10 and NUS-WIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to state-of-the-art deep hash models.
翻译:最近的研究显示,利用受监督的信息可导致高质量的散列。然而,在将受监督的散列应用到一个新的领域时,注解数据的成本往往是一个障碍。此外,由于培训和测试阶段的数据可能来自类似但不同的分布,因此结果可能因稳健性问题而受到影响。本文研究通过半监督的基因对抗网络(GANs)探索合成数据,这种网络在很大程度上利用未经标记和有限的标签培训数据来生成具有内在差异和全球一致性的高度令人信服的数据。然而,如果将受监督的散列应用到一个新的领域,那么,注解数据的成本往往是一个障碍。具体地说,与GANs(DSH-GANs)在培训和测试阶段的数据可能来自相似但不同的分布。本文研究通过半监督的基因对称对立网络(GANs)探索合成数据生成合成数据的问题,这主要包括四个组成部分:用于学习图像展示的深度变相神经网络(CNNNNN),以及用于将合成图像与真实图像区分的直径直径流,为了更精确地和全球一致性,从而更好地了解自然数据结构显示,我们能够通过合成图像的流流将上述图像转换成更精确的图像显示真实的流,而将以上的图像显示真实的流流流,对等数据分析,以及对等数据进行真正的图像分析,对等数据分析,对等数据分析,对等数据分析,对等数据分析,对等数据分析,对等数据,对等数据,对等数据,对等数据分析的对等数据,对等数据,对等数据,对等数据,对等的计算,对等数据,对等的顺序对等的对等的顺序对等的测算算算算的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的计算,对等的对等的计算,对等的计算,对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等结构的对等的对等的