Spectral clustering is a popular tool in network data analysis, with applications in a variety of scientific application areas. However, many studies have shown that spectral clustering does not perform well on certain network structures, particularly core-periphery networks. To improve clustering performance in core-periphery structures, Adjacency Spectral Embedding (ASE) has been introduced, which performs clustering via a network's adjacency matrix instead of the graph Laplacian. Despite its advantages in this setting, the optimal performance of ASE is limited to dense networks, whilst network data observed in practice is often sparse in nature. To address this limitation, we propose a new approach which we term Doubled Adjacency Spectral Embedding (DASE), motivated by the observation that the squared adjacency matrix will leverage the fewer connections in sparse structures more efficiently in clustering applications. Theoretical results establish that DASE enjoys good consistency properties when determining sparse community structure. The performance and general applicability of the proposed method is evaluated using extensive simulations on both directed and undirected networks. Our results highlight the improved clustering performance on both sparse and dense networks in the presence of core-periphery structures. We illustrate our proposed technique on real-world employment and transportation datasets.
翻译:谱聚类是网络数据分析中的一种常用工具,在多种科学应用领域中均有应用。然而,许多研究表明,谱聚类在某些网络结构上表现不佳,特别是核心-外围网络。为提升核心-外围结构中的聚类性能,邻接谱嵌入方法被提出,该方法通过网络的邻接矩阵而非图拉普拉斯矩阵进行聚类。尽管在此类场景中具有优势,但ASE的最佳性能仅限于稠密网络,而实际观测到的网络数据本质往往是稀疏的。为克服这一局限,我们提出了一种新方法,称为双倍邻接谱嵌入方法,其动机在于观察到平方邻接矩阵能在聚类应用中更有效地利用稀疏结构中较少的连接。理论结果表明,DASE在确定稀疏社区结构时具有良好的—致性。通过在有向和无向网络上进行广泛模拟,评估了所提方法的性能与普适性。我们的结果突显了在存在核心-外围结构时,该方法在稀疏和稠密网络上均具有改进的聚类性能。我们通过实际就业和交通数据集展示了所提技术的应用效果。