用于近似最近邻搜索的快速收敛邻近图 (Fast-Convergent Proximity Graphs for Approximate Nearest Neighbor Search)

Approximate nearest neighbor (ANN) search in high-dimensional metric spaces is a fundamental problem with many applications. Over the past decade, proximity graph (PG)-based indexes have demonstrated superior empirical performance over alternatives. However, these methods often lack theoretical guarantees regarding the quality of query results, especially in the worst-case scenarios. In this paper, we introduce the α-convergent graph (α-CG), a new PG structure that employs a carefully designed edge pruning rule. This rule eliminates candidate neighbors for each data point p by applying the shifted-scaled triangle inequalities among p, its existing out-neighbors, and new candidates. If the distance between the query point q and its exact nearest neighbor v* is at most τ for some constant τ > 0, our α-CG finds the exact nearest neighbor in poly-logarithmic time, assuming bounded intrinsic dimensionality for the dataset; otherwise, it can find an ANN in the same time. To enhance scalability, we develop the α-convergent neighborhood graph (α-CNG), a practical variant that applies the pruning rule locally within each point's neighbors. We also introduce optimizations to reduce the index construction time. Experimental results show that our α-CNG outperforms existing PGs on real-world datasets. For most datasets, α-CNG can reduce the number of distance computations and search steps by over 15% and 45%, respectively, when compared with the best-performing baseline.

翻译：高维度量空间中的近似最近邻（ANN）搜索是一个具有广泛应用的基础性问题。在过去十年中，基于邻近图（PG）的索引方法在实证性能上已展现出优于其他方法的优势。然而，这些方法往往缺乏关于查询结果质量的理论保证，尤其是在最坏情况下。本文提出了一种新的PG结构——α收敛图（α-CG），它采用了一种精心设计的边剪枝规则。该规则通过在数据点p、其现有出边邻居以及新候选点之间应用平移缩放三角不等式，来消除每个数据点p的候选邻居。若查询点q与其精确最近邻v*之间的距离对于某个常数τ > 0满足不大于τ，则假设数据集具有有界本征维度时，我们的α-CG能在多对数时间内找到精确最近邻；否则，它能在相同时间内找到一个近似最近邻。为提升可扩展性，我们开发了α收敛邻域图（α-CNG），这是一种实用变体，它在每个点的邻域内局部应用剪枝规则。我们还引入了优化措施以减少索引构建时间。实验结果表明，我们的α-CNG在真实世界数据集上优于现有PG方法。对于大多数数据集，与性能最佳的基线方法相比，α-CNG可将距离计算次数和搜索步骤数分别减少超过15%和45%。