Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint. For the tractability of training, existing generative-hashing methods mostly assume a factorized form for the posterior distribution, enforcing independence among the bits of hash codes. From the perspectives of both model representation and code space size, independence is always not the best assumption. In this paper, to introduce correlations among the bits of hash codes, we propose to employ the distribution of Boltzmann machine as the variational posterior. To address the intractability issue of training, we first develop an approximate method to reparameterize the distribution of a Boltzmann machine by augmenting it as a hierarchical concatenation of a Gaussian-like distribution and a Bernoulli distribution. Based on that, an asymptotically-exact lower bound is further derived for the evidence lower bound (ELBO). With these novel techniques, the entire model can be optimized efficiently. Extensive experimental results demonstrate that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
翻译:生成语义散列是一个大比例信息检索的很有希望的技术, 因为它的检索速度很快,记忆足迹很小。 对于培训的可移动性, 现有的基因显示方法大多以后部分布为因子化形式, 强制在散列代码中保持独立。 从模型表达和代码空间大小的角度来看, 独立性总是不是最佳的假设。 在本文中, 为了引入散列代码各部分的关联性, 我们提议使用Boltzmann机器作为变式后视镜的分布方式。 为了解决培训的可选性问题, 我们首先开发了一种大致方法, 通过将波尔兹曼机器的分布作为高斯类分布和伯尔努利分布的分级组合来进行重新量化。 基于这一点, 一个非现性、 异性、 更低的界限是进一步推导出的证据约束性( ELBOO ) 。 有了这些新技术, 整个模型可以被高效地优化。 为了解决培训的易移动性问题, 我们首先开发了一种大致的实验结果, 通过将波尔茨曼机器的分布作为不同部分之间具有显著的代码, 我们的模型能够取得显著的成绩。