Spectral clustering is a popular method for community detection in networks under the assumption of the standard stochastic blockmodel. Taking a matrix representation of the graph such as the adjacency matrix, the nodes are clustered on a low dimensional projection obtained from a truncated spectral decomposition of the matrix. Estimating the number of communities and the dimension of the reduced latent space well is crucial for good performance of spectral clustering algorithms. Real-world networks, such as computer networks studied in cyber-security applications, often present heterogeneous within-community degree distributions which are better addressed by the degree-corrected stochastic blockmodel. A novel, model-based method is proposed in this article for simultaneous and automated selection of the number of communities and latent dimension for spectral clustering under the degree-corrected stochastic blockmodel. The method is based on a transformation to spherical coordinates of the spectral embedding, and on a novel modelling assumption in the transformed space, which is then embedded into an existing model selection framework for estimating the number of communities and the latent dimension. Results show improved performance over competing methods on simulated and real-world computer network data.
翻译:光谱聚集是在标准相位模型假设下在网络中进行社区探测的流行方法。以图表的矩阵表示,如相邻矩阵矩阵,节点集中在从矩阵的短光谱分解中得出的低维投影上。估计社区数量和潜伏空间缩小的维度对于光谱集算法的良好运行至关重要。真实世界网络,如在网络安全应用中研究的计算机网络,往往目前社区内部的分布差异性,这种差异性在学位校正的区块模型中得到了更好的处理。在本篇文章中提出了一种新的、基于模型的方法,用于在经度校正透析的区块模型下同步和自动选择社区数量和光谱集的潜在维度。该方法的基础是转换成光谱集嵌入的球坐标,以及改变后的空间的新建模假设,然后嵌入一个用以估计社区数量和潜在维度的现有模型选择框架。结果显示,模拟和真实世界计算机网络的竞争方法的绩效得到了改进。