The $k$-nearest neighbor graph (KNNG) on high-dimensional data is a data structure widely used in many applications such as similarity search, dimension reduction and clustering. Due to its increasing popularity, several methods under the same framework have been proposed in the past decade. This framework contains two steps, i.e. building an initial KNNG (denoted as \texttt{INIT}) and then refining it by neighborhood propagation (denoted as \texttt{NBPG}). However, there remain several questions to be answered. First, it lacks a comprehensive experimental comparison among representative solutions in the literature. Second, some recently proposed indexing structures, e.g., SW and HNSW, have not been used or tested for building an initial KNNG. Third, the relationship between the data property and the effectiveness of \texttt{NBPG} is still not clear. To address these issues, we comprehensively compare the representative approaches on real-world high-dimensional data sets to provide practical and insightful suggestions for users. As the first attempt, we take SW and HNSW as the alternatives of \texttt{INIT} in our experiments. Moreover, we investigate the effectiveness of \texttt{NBPG} and find the strong correlation between the huness phenomenon and the performance of \texttt{NBPG}.
翻译:在高维数据上, $k$- 近邻图( KNNG) 是在许多应用中广泛使用的数据结构, 如相似搜索、 维度减少和集群。 由于它越来越受欢迎, 过去十年中在同一框架下提出了几种方法。 这个框架包含两个步骤, 即建立初始的KNNG( 称为\ textt{ INIT} ), 然后通过邻里传播来改进它( 标记为\ textt{ NBPG} ) 。 但是, 还有一些问题需要解答。 首先, 它缺乏对文献中具有代表性的解决方案的全面实验性比较。 其次, 最近提出的一些索引结构, 例如, SW 和 HNWW, 没有被使用或测试来建立初始的 KNNNNG。 第三, 数据属性与\ textt{ NBGPG} 有效性之间的关系仍然不清楚。 为了解决这些问题, 我们全面比较了在现实世界高维数据集上的代表性方法, 以便为用户提供实用和深刻的建议。 作为第一次尝试, 我们采取SW 和HNFNF\ textreal 的替代品 。