Approximate nearest neighbor (ANN) search in high-dimensional metric spaces is a fundamental problem with many applications. Over the past decade, proximity graph (PG)-based indexes have demonstrated superior empirical performance over alternatives. However, these methods often lack theoretical guarantees regarding the quality of query results, especially in the worst-case scenarios. In this paper, we introduce the {\alpha}-convergent graph ({\alpha}-CG), a new PG structure that employs a carefully designed edge pruning rule. This rule eliminates candidate neighbors for each data point p by applying the shifted-scaled triangle inequalities among p, its existing out-neighbors, and new candidates. If the distance between the query point q and its exact nearest neighbor v* is at most {\tau} for some constant {\tau} > 0, our {\alpha}-CG finds the exact nearest neighbor in poly-logarithmic time, assuming bounded intrinsic dimensionality for the dataset; otherwise, it can find an ANN in the same time. To enhance scalability, we develop the {\alpha}-convergent neighborhood graph ({\alpha}-CNG), a practical variant that applies the pruning rule locally within each point's neighbors. We also introduce optimizations to reduce the index construction time. Experimental results show that our {\alpha}-CNG outperforms existing PGs on real-world datasets. For most datasets, {\alpha}-CNG can reduce the number of distance computations and search steps by over 15% and 45%, respectively, when compared with the best-performing baseline.
翻译:暂无翻译