Fast and reliable K-Nearest Neighbor Graph algorithms are more important than ever due to their widespread use in many data processing techniques. This paper presents a runtime optimized C implementation of the heuristic "NN-Descent" algorithm by Wei Dong et al. for the l2-distance metric. Various implementation optimizations are explained which improve performance for low-dimensional as well as high dimensional datasets. Optimizations to speed up the selection of which datapoint pairs to evaluate the distance for are primarily impactful for low-dimensional datasets. A heuristic which exploits the iterative nature of NN-Descent to reorder data in memory is presented which enables better use of locality and thereby improves the runtime. The restriction to the l2-distance metric allows for the use of blocked distance evaluations which significantly increase performance for high dimensional datasets. In combination the optimizations yield an implementation which significantly outperforms a widely used implementation of NN-Descent on all considered datasets. For instance, the runtime on the popular MNIST handwritten digits dataset is halved.
翻译:K-Nearest Neearbor Graph 算法由于在许多数据处理技术中广泛使用,因此比以往任何时候更加重要。本文件展示了魏东等人为l2-远度测量而使用的超光速C最优化的“NN-白”算法。各种执行优化都提高了低维和高维数据集的性能。优化以加快选择哪些数据点对评价距离的数据点对低维数据集的影响最大。介绍了利用 NNE-D的迭接性对记忆中数据进行重新排序的超常性能,从而能够更好地利用地点,从而改进运行时间。对l2-距离计量的限制允许使用阻隔的远程评价,从而大大提高了高维数据集的性能。在结合中,优化产生一个执行大大超过所有考虑数据集广泛使用的NNE-D白的功能。例如,流行的MNIST手写数字数据集运行时间减半。