A Shared Nearest Neighbor (SNN) graph is a type of graph construction using shared nearest neighbor information, which is a secondary similarity measure based on the rankings induced by a primary $k$-nearest neighbor ($k$-NN) measure. SNN measures have been touted as being less prone to the curse of dimensionality than conventional distance measures, and thus methods using SNN graphs have been widely used in applications, particularly in clustering high-dimensional data sets and in finding outliers in subspaces of high dimensional data. Despite this, the theoretical study of SNN graphs and graph Laplacians remains unexplored. In this pioneering work, we make the first contribution in this direction. We show that large scale asymptotics of an SNN graph Laplacian reach a consistent continuum limit; this limit is the same as that of a $k$-NN graph Laplacian. Moreover, we show that the pointwise convergence rate of the graph Laplacian is linear with respect to $(k/n)^{1/m}$ with high probability.
翻译:共享最近邻(Shared Nearest Neighbor,SNN)图是一种利用共同最近邻信息进行图构建的方法,它是一种基于由主要的k最近邻(k-NN)度量导出的排名的次要相似性度量。SNN度量被认为比传统距离度量更不容易受到维度灾难的影响,因此使用SNN图的方法在应用中被广泛用于聚类高维数据集和在高维数据子空间中找到异常值。尽管如此,对SNN图和图拉普拉斯算子的理论研究仍未被探索。在这篇开创性的工作中,我们做出了第一个贡献。我们表明,SNN图拉普拉斯算子的大规模渐近行为达到了一个一致的连续极限;这个极限与k-NN图拉普拉斯算子的极限相同。此外,我们表明,图拉普拉斯算子的逐点收敛速度是以$(k/n)^{1/m}$为线性比例的高概率。