We study the task of clustering in directed networks. We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods which are based on a combination of data regularization and SVD truncation, and works well down to the very sparse regime where the edge density has constant order. Our analysis is based on a Master Theorem describing sharp asymptotics for isolated eigenvalues/eigenvectors of sparse, non-symmetric matrices with independent entries. We also describe the limiting distribution of the entries of these eigenvectors; in the task of digraph clustering with spectral embeddings, we provide numerical evidence for the superiority of Gaussian Mixture clustering over the widely used k-means algorithm.
翻译:我们研究了在定向网络中集群的任务。我们发现,使用相邻矩阵的二元值/二元分解比所有基于数据正规化和SVD脱轨相结合的通用方法简单得多,工作到边缘密度有恒定顺序的非常稀少的状态。我们的分析基于一个主理论,描述孤立的、具有独立条目的非对称矩阵的稀疏、非对称矩阵的零散的二元值/分解器的尖锐性静脉冲。我们还描述了这些分泌器条目的有限分布;在光谱嵌入的分层集成任务中,我们提供了数字证据,说明高斯混集群优于广泛使用的 k means 算法。