Approximate nearest neighbor search (ANNS) on high-dimensional vectors has become a fundamental and essential component in various machine learning tasks. Prior research has shown that the distance comparison operation is the bottleneck of ANNS, which determines the query and indexing performance. To overcome this challenge, some novel methods have been proposed recently. The basic idea is to estimate the actual distance with fewer calculations, at the cost of accuracy loss. Inspired by this, we also propose that some classical techniques and deep learning models can also be adapted to this purpose. In this paper, we systematically categorize the techniques that have been or can be used to accelerate distance approximation. And to help the users understand the pros and cons of different techniques, we design a fair and comprehensive benchmark, Fudist implements these techniques with the same base index and evaluates them on 16 real datasets with several evaluation metrics. Designed as an independent and portable library, Fudist is orthogonal to the specific index structure and thus can be easily utilized in the current ANNS library to achieve significant improvements.
翻译:暂无翻译