k-nearest neighbor graph is a fundamental data structure in many disciplines such as information retrieval, data-mining, pattern recognition, and machine learning, etc. In the literature, considerable research has been focusing on how to efficiently build an approximate k-nearest neighbor graph (k-NN graph) for a fixed dataset. Unfortunately, a closely related issue of how to merge two existing k-NN graphs has been overlooked. In this paper, we address the issue of k-NN graph merging in two different scenarios. In the first scenario, a symmetric merge algorithm is proposed to combine two approximate k-NN graphs. The algorithm facilitates large-scale processing by the efficient merging of k-NN graphs that are produced in parallel. In the second scenario, a joint merge algorithm is proposed to expand an existing k-NN graph with a raw dataset. The algorithm enables the incremental construction of a hierarchical approximate k-NN graph. Superior performance is attained when leveraging the hierarchy for NN search of various data types, dimensionality, and distance measures.
翻译:k- nearest 邻居图形是信息检索、数据挖掘、模式识别和机器学习等许多学科的基本数据结构。 在文献中,大量研究一直侧重于如何高效率地为固定数据集构建一个近k- nest邻居图形(k- NN 图形)。 不幸的是,一个密切相关的关于如何合并两个现有的 k- NN 图形的问题被忽略了。在本文中,我们处理K- NN 图形在两种不同情景下合并的问题。在第一种情景中,建议采用对称合并算法,将两个近k- NN 图形合并在一起。算法通过同时生成的 k- NN 图形的高效合并,促进大规模处理。在第二种情景中,建议采用联合合并算法,将现有的 k- NN 图形与一个原始数据集一起扩展。 算法使得能够逐步构建一个等级近K- NN 图形。 当利用N 搜索各种数据类型、 维度和距离测量的等级时,将达到高级性。