When reasoning about tasks that involve large amounts of data, a common approach is to represent data items as objects in the Hamming space where operations can be done efficiently and effectively. Object similarity can then be computed by learning binary representations (hash codes) of the objects and computing their Hamming distance. While this is highly efficient, each bit dimension is equally weighted, which means that potentially discriminative information of the data is lost. A more expressive alternative is to use real-valued vector representations and compute their inner product; this allows varying the weight of each dimension but is many magnitudes slower. To fix this, we derive a new way of measuring the dissimilarity between two objects in the Hamming space with binary weighting of each dimension (i.e., disabling bits): we consider a field-agnostic dissimilarity that projects the vector of one object onto the vector of the other. When working in the Hamming space, this results in a novel projected Hamming dissimilarity, which by choice of projection, effectively allows a binary importance weighting of the hash code of one object through the hash code of the other. We propose a variational hashing model for learning hash codes optimized for this projected Hamming dissimilarity, and experimentally evaluate it in collaborative filtering experiments. The resultant hash codes lead to effectiveness gains of up to +7% in NDCG and +14% in MRR compared to state-of-the-art hashing-based collaborative filtering baselines, while requiring no additional storage and no computational overhead compared to using the Hamming distance.
翻译:当对涉及大量数据的任务进行推理时,一个共同的方法是将数据项目作为可高效率和有成效地运行的Hamming空间中的物体来代表数据项目。然后,可以通过学习对象的二进制表达式(hash 代码)和计算其振动距离来计算物体的相似性。虽然这是高度高效的,但每个位元的维度都是同等加权的,这意味着数据中可能具有歧视性的信息丢失。一个更明显的替代办法是使用实际价值的矢量表达式,并计算其内部产品;这允许每个维度的重量不同,但速度要慢得多。为了解决这个问题,我们提出了一种新的方法,用每个维度的二进制表达式表达式来测量两个物体之间的异性(h hash 代码) 。我们考虑的是将一个物体的矢量投射到另一个矢量上。当在Hamming空间工作时,这导致一种基于新颖的预测,通过选择,可以有效地使一个物体的模型的比重变得二进份重要,用每个维度来测量每个维度(即,不易位位点位点位点) 将一个物体的计算结果,我们提出一个对一个实验性的计算结果的计算。我们用了一个模型的模型的计算结果的计算, 正在学习一个对一个对一个模型的计算结果的计算结果的计算结果,用一个对一个模型进行一个对一个对一个对一个对一个对一个对一个对一个对炼的计算。