Fixed-radius nearest neighbor search is a common database operation that retrieves all data points within a user-specified distance to a query point. There are efficient approximate nearest neighbor search algorithms that provide fast query responses but they often have a very compute-intensive indexing phase and require parameter tuning. Therefore, exact brute force and tree-based search methods are still widely used. Here we propose a new fixed-radius nearest neighbor search method that significantly improves over brute force and tree-based methods in terms of index and query time, reliably returns exact results, and requires no parameter tuning. The method exploits a sorting of the data points by their first principal component, thereby facilitating a reduction in query search space. Further speedup is gained from an efficient implementation using high-level Basic Linear Algebra Subprograms (BLAS). We provide theoretical analysis of our method and demonstrate its practical performance when used stand-alone and when applied within the DBSCAN clustering algorithm.
翻译:近邻的固定半径搜索是一种常见的数据库操作,它从用户指定的距离内检索到查询点的所有数据点。有近近邻的有效搜索算法,提供快速查询响应,但往往具有非常计算密集的索引阶段,需要参数调整。因此,精确的布鲁特力和基于树的搜索方法仍然被广泛使用。在这里,我们提议一种新的固定的近邻搜索方法,在指数和查询时间方面大大改进粗力和以树为基础的方法,可靠地返回准确的结果,不需要参数调整。该方法利用数据点的第一个主部件进行分类,从而方便缩小查询搜索空间。通过使用高级基本直线 Algebra 子程序(BLAS) 的高效实施而进一步加快速度。我们提供了对方法的理论分析,并展示了在使用独立和在 DBSCAN 群集算法中应用时的实际表现。