In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and employs a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be both theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5X ~ 3X higher query-per-second (QPS) throughput compared to the widely-used graph-based search algorithm HNSW.
翻译:本文在高维欧氏空间相似性搜索的背景下研究角度测试问题,提出了两种基于投影的概率核函数:一种用于角度比较,另一种用于角度阈值判定。与现有依赖高斯分布随机投影向量的方法不同,我们的方法利用参考角度并采用确定性结构构建投影向量。值得注意的是,我们的核函数无需依赖渐进性假设(如投影向量数量趋于无穷),且理论和实验均证明其性能优于基于高斯分布的核函数。我们将所提出的核函数应用于近似最近邻搜索(ANNS),实验表明相较于广泛使用的基于图的搜索算法HNSW,我们的方法实现了2.5倍至3倍的每秒查询数(QPS)吞吐量提升。