Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than $1$) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.
翻译:社会网络、 知识图、 推荐系统、 生命科学 和决策问题 中通常会出现大图 。 以高层次特性对大图进行汇总有助于解决这些环境中的问题。 在光谱集中, 我们的目标是确定节点的组群, 大部分边缘都位于群集内, 只有很少的边缘位于群集之间。 这项任务对于许多下游应用和探索性分析很重要。 光谱集的核心步骤正在对相应的图 Laplaceian 矩阵( 或相当的, 单值变异, SVD, 事件矩阵的 SVD ) 进行分解。 迭代单值变异化方法的趋同取决于给定矩阵频谱的微增缩图, 也就是说, 连续的egen值之间的差。 对于一个与精密组合图相对应的图形, egen值将是一个非异性但非常小( 低于 $1美元) 慢化的趋同。 本文介绍了一种可平行的分解方法, 以加速 SVD 解器和转而转而转而来, 连续的光谱变的矩阵 。 通过这个实验性矩阵的变的模型将可以实现到 。 速度化矩阵 。 通过一个可实现 的模型变的模型的 。