In the era of big data, methods for improving memory and computational efficiency have become crucial for successful deployment of technologies. Hashing is one of the most effective approaches to deal with computational limitations that come with big data. One natural way for formulating this problem is spectral hashing that directly incorporates affinity to learn binary codes. However, due to binary constraints, the optimization becomes intractable. To mitigate this challenge, different relaxation approaches have been proposed to reduce the computational load of obtaining binary codes and still attain a good solution. The problem with all existing relaxation methods is resorting to one or more additional auxiliary variables to attain high quality binary codes while relaxing the problem. The existence of auxiliary variables leads to coordinate descent approach which increases the computational complexity. We argue that introducing these variables is unnecessary. To this end, we propose a novel relaxed formulation for spectral hashing that adds no additional variables to the problem. Furthermore, instead of solving the problem in original space where number of variables is equal to the data points, we solve the problem in a much smaller space and retrieve the binary codes from this solution. This trick reduces both the memory and computational complexity at the same time. We apply two optimization techniques, namely projected gradient and optimization on manifold, to obtain the solution. Using comprehensive experiments on four public datasets, we show that the proposed efficient spectral hashing (ESH) algorithm achieves highly competitive retrieval performance compared with state of the art at low complexity.
翻译:在大数据时代,改进记忆和计算效率的方法已成为成功应用技术的关键。 散列是处理计算限制的最有效方法之一, 与大数据相伴而生的计算限制。 制定这一问题的一个自然的方法是光谱散列,直接结合亲和,学习二进制代码。 但是,由于二进制限制,优化变得难以解决。 为了减轻这一挑战,提出了不同的放松方法,以减少获得二进制代码的计算负荷,并仍然可以找到一个良好的解决方案。 现有所有放松方法的问题在于利用一个或多个额外的辅助变量,在缓解问题的同时达到高质量的二进制代码。 辅助变量的存在导致协调下降方法,从而增加计算复杂性。 我们主张,引入这些变量是不必要的。 为此,我们提议为光谱散而采用新的宽松的配方,这样不会增加问题的额外变量。 此外,为了解决原始空间中变量数量与数据点相等的问题,我们用一个小得多的空间解决问题,并从这个解决方案中检索两个二进制代码。 这个工具会降低记忆和计算模型的精度方法, 我们用四个进化的精度的精度, 我们用预测的精度的精度的精度模型, 展示了四进度的精度的精度的精度的精度。