Many unsupervised hashing methods are implicitly established on the idea of reconstructing the input data, which basically encourages the hashing codes to retain as much information of original data as possible. However, this requirement may force the models spending lots of their effort on reconstructing the unuseful background information, while ignoring to preserve the discriminative semantic information that is more important for the hashing task. To tackle this problem, inspired by the recent success of contrastive learning in learning continuous representations, we propose to adapt this framework to learn binary hashing codes. Specifically, we first propose to modify the objective function to meet the specific requirement of hashing and then introduce a probabilistic binary representation layer into the model to facilitate end-to-end training of the entire model. We further prove the strong connection between the proposed contrastive-learning-based hashing method and the mutual information, and show that the proposed model can be considered under the broader framework of the information bottleneck (IB). Under this perspective, a more general hashing model is naturally obtained. Extensive experimental results on three benchmark image datasets demonstrate that the proposed hashing method significantly outperforms existing baselines.
翻译:许多未经监督的散列方法都暗含着重建输入数据的想法,这基本上鼓励散列代码保留尽可能多的原始数据信息,然而,这一要求可能迫使模型花费大量精力重建不有用的背景资料,同时忽视保留对仓列任务更为重要的歧视性语义信息。由于最近在学习连续演示方面对比学习的成功,我们提议调整这一框架,以学习二进制散列代码。具体地说,我们首先提议修改目标功能,以满足集成的具体要求,然后在模型中引入一个概率二进制代表层,以便利对整个模型进行端到端的培训。我们进一步证明拟议的以对比学习为基础的散列方法与相互信息之间的密切联系,并表明可以在更广泛的信息瓶颈框架(IB)下考虑拟议的模式。在这个角度下,一个更一般的散列模型是自然获得的。在三个基准图像数据集上的广泛实验结果表明,拟议的方法已经大大地超越了现有的基准。