Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block model that is not necessarily of full rank. To do this, we first use the recently developed sketching algorithms to obtain two randomized spectral clustering algorithms, namely, the random projection-based and the random sampling-based spectral clustering. Then we study the theoretical bounds of the resulting algorithms in terms of the approximation error for the population adjacency matrix, the misclassification error, and the estimation error for the link probability matrix. It turns out that, under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm. We also extend the results to degree-corrected stochastic block models. Numerical experiments support our theoretical findings and show the efficiency of randomized methods. A new R package called Rclust is developed and made available to the public.
翻译:光谱群集是网络中社区探测广泛使用的方法之一。 然而, 大型网络给网络中的光值分解造成计算挑战。 在本文中, 我们从统计角度使用随机的草图算法研究光谱群集, 我们通常认为网络数据来自不一定完全级的随机区块模型。 为此, 我们首先使用最近开发的草图算法获得两种随机的光谱群集算法, 即随机投影和随机抽样光谱群集。 然后, 我们用人口相邻矩阵近似错误、 错误分类错误和链接概率矩阵估计错误来研究由此产生的算法的理论界限。 事实证明, 在温和的条件下, 随机光谱组算算算算法导致与原始光谱组算法相同的理论界限。 我们还将结果推广到经度校正的光谱块模型。 数字实验支持我们的理论发现, 并展示了随机分析方法的效率。 新的Rclasm 软件包被称作“ 公制” 。