使用集成应用程序的近邻搜索 (Exact fixed-radius nearest neighbor search with an application to clustering)

Fixed-radius nearest-neighbor search is a common database operation that retrieves all data points within a user-specified distance to a query point. There are efficient approximate nearest neighbor search algorithms that provide fast query responses but they often have a very compute-intensive indexing phase and require parameter tuning. Therefore, exact brute force and tree-based search methods are still widely used. Here we propose a new fixed-radius nearest neighbor search method that significantly improves over brute force and tree-based methods in terms of index and query time, returns exact results, and requires no parameter tuning. The method exploits a sorting of the data points by their first principal component, thereby facilitating a reduction in query search space. Further speedup is gained from an efficient implementation using high-level Basic Linear Algebra Subprograms (BLAS). We provide theoretical analysis of our method and demonstrate its practical performance when used stand-alone and when applied within a clustering algorithm.

翻译：近距离近邻搜索是一种常见的数据库操作, 将用户指定距离内的所有数据点检索到查询点。有高效近近邻搜索算法, 提供快速查询回复, 但通常具有非常计算密集的索引阶段, 需要参数调整。因此, 精确的布鲁特力和以树为基础的搜索方法仍然被广泛使用。我们在此提议一种新的近距离固定半径搜索方法, 大大改进了粗力和以树为基础的方法, 在索引和查询时间方面, 返回准确的结果, 不需要参数调控。该方法利用数据点的第一个主部件进行分类, 从而便利查询搜索空间的缩小。使用高水平的 Basintar Algebra 子程序( BLAAS) 来高效实施, 我们提供对方法的理论分析, 并演示其独立使用和在群集算法中应用时的实际性能。