Nearest neighbor search is to find the data points in the database such that the distances from them to the query are the smallest, which is a fundamental problem in various domains, such as computer vision, recommendation systems and machine learning. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this paper, we present a comprehensive survey of the deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically, we categorize deep supervised hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover, deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing, domain adaption deep hashing and multi-modal deep hashing. Meanwhile, we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discuss some potential research directions in conclusion.
翻译:近邻搜索的目的是在数据库中找到数据点, 以便从它们到查询的距离最小, 这是计算机视觉、 推荐系统和机器学习等不同领域的一个基本问题。 散列是计算和存储效率最广泛使用的方法之一 。 随着深层学习的开发, 深散列方法比传统方法更具有优势 。 在本文中, 我们展示了对深度散列算法的全面调查, 包括深度监督的散列和深度不受监督的散列。 具体地说, 我们将深层监督的散列方法分类为对齐方法、 分级方法、 点度方法, 以及根据如何测量所学的散列码相似性的方法的量化方法 。 此外, 深层未受监督的散列散列方法被归类为类似的基于重建的方法、 假标签法 和无预测的自我监督的学习方法 。 我们还介绍了三个相关的重要专题, 包括半超度深层散列、 域调整深度的散列和多位深度的散列方法, 以及根据测量的方法 如何测量 。 最后, 我们使用了一些共同的演算方法 。