Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determined by a combination of a latent block model structure and homophily on the observed covariates. We propose a spectral algorithm that we prove achieves perfect clustering with high probability on a class of large, sparse networks with discrete covariates, effectively separating latent network structure from homophily on observed covariates. To our knowledge, our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering in the setting where edge formation is dependent on both latent and observed factors.
翻译:在社区检测方法中,光谱集聚具有两种可取的特性:计算效率和一致性的理论保障。大部分光谱集聚研究只考虑网络边缘作为算法的投入。在这里,我们考虑了在离散节点共变体存在的情况下进行社区检测的问题,网络结构是由潜在区块模型结构的组合和观察到的共变体的同质决定的。我们建议了一种光谱算法,这种算法在具有离散共变体的大型、分散的网络中实现完美的集聚,其概率很高,有效地将潜在网络结构与观察到的共变体同质分离。据我们所知,我们的方法是首先保证在边缘形成取决于潜在和观察因素的环境下利用光谱聚集进行连续的潜在结构恢复。