We study the Personalized PageRank (PPR) algorithm, a local spectral method for clustering, which extracts clusters using locally-biased random walks around a given seed node. In contrast to previous work, we adopt a classical statistical learning setup, where we obtain samples from an unknown nonparametric distribution, and aim to identify sufficiently salient clusters. We introduce a trio of population-level functionals -- the normalized cut, conductance, and local spread, analogous to graph-based functionals of the same name -- and prove that PPR, run on a neighborhood graph, recovers clusters with small population normalized cut and large conductance and local spread. We apply our general theory to establish that PPR identifies connected regions of high density (density clusters) that satisfy a set of natural geometric conditions. We also show a converse result, that PPR can fail to recover geometrically poorly-conditioned density clusters, even asymptotically. Finally, we provide empirical support for our theory.
翻译:我们研究个性化的PageRank(PPR)算法,这是一种本地光谱集成方法,它利用当地偏差随机游荡于某个种子节点周围的集群。与以往的工作不同,我们采用了典型的统计学习设置,从未知的非参数分布中获取样本,目的是识别足够突出的集群。我们引入了人口级功能的三组 -- -- 标准化切分、导演和本地分布,类似于同一名称的图形函数 -- -- 并证明PPR,在邻里图上运行,回收小人口群集,实现正常切分和大规模导行和本地分布。我们运用我们的一般理论来确定PPR识别高密度(密度)的连接区域,满足一系列自然几何条件。我们还展示了一个反向结果,即PPR无法恢复几何条件差的密度组。最后,我们对我们的理论提供了经验性支持。