Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries, which is a fundamental problem in various domains, such as computer vision, recommendation systems and machine learning. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this survey, we detailedly investigate current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically, we categorize deep supervised hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover, deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing, domain adaption deep hashing and multi-modal deep hashing. Meanwhile, we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discuss some potential research directions in conclusion.
翻译:近邻搜索的目的是从数据库中获得样本,距离数据库最小,从它们到查询是最短的距离,这是计算机视觉、推荐系统和机器学习等不同领域的一个根本问题。 散列是计算和储存效率最广泛使用的方法之一。 随着深层学习的发展, 深层散列方法显示出比传统方法更大的优势。 在这次调查中, 我们详细调查目前的深层散列算法, 包括深层监督的散列法和深层不受监督的散列法。 具体地说, 我们根据测量所学散列码相似性的方式, 将深层监督的散列法方法分为对齐方法、 排名法、 点法方法以及量化法。 此外, 深层未经监督的散列法被归类为相似的重建法、 假标签法和无预测的自我监督的学习法。 我们还介绍了三个相关的重要问题, 包括半监督的深层散列法、 域调整深层散列和多位散列的散列法。 最后, 我们用了一些共同的演算法 来研究。 最后, 我们用了一些共同使用了一些深层次的演算法 。