In this paper, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH). First, to learn informative representations that can preserve both intra- and inter-modal similarities, we leverage the recent advances in estimating variational lower-bound of MI to maximize the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modelled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intra- and inter-modal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence, balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
翻译:在本文中,我们采取了最大程度的相互信息(MI)方法,以解决在不受监督的情况下学习二进制散装散装散装物规范的问题,以便高效的跨模式检索。我们提出了一个叫作跨模式信息-Max Hashing(CMIMH)的新颖方法。首先,我们学习能够保存内部和跨模式相似性的信息性陈述。首先,我们利用近期在估算MI的二进制低调下调表述方面的最新进展,以最大限度地扩大二进制表示和输入特征之间以及不同模式的二进制表述之间的MI度问题。因此,在假设二进制表述以多变伯努利分布为模范的情况下,共同将这些MIs最大化。我们可以学习二进制表述,这既能维护内部和跨模式的相似性,又能与梯度下降有效进行微型组合。此外,我们发现,通过从不同模式中学习类似的二进式表述,试图最大限度地缩小模式差距。因此,在减少模式差距和失去模式-私营信息之间实现平衡对于跨模式检索任务十分重要。关于标准基准数据更新方法的定量评价显示其他方法的一致性。