Community detection is an important problem when processing network data. In many real data sets, the adjacency matrix can be too sparse at some nodes for existing methods to obtain any community information. The covariates have shown support in community detection. However, how to combine the covariates is a challenge, because covariates may have high dimensions and inconsistent class labels with the network. To quantify the relationship between the covariates and the network, we propose a general model, called covariate assisted degree corrected stochastic block model (CA- DCSBM). Based on CA-DCSBM, we design the adjusted neighbor-covariate (ANC) data matrix, which leverages covariate information to assist community detection. We then prove that the spectral clustering method on the ANC matrix will combine the network and covariates. The resulting method, named CA-SCORE, is shown to have the oracle property under mild conditions. In particular, we show that our framework can cover challenging scenarios where the adjacency matrix has no community information, or the covariate matrix has different community labels from the ones of the adjacency matrix. Finally, we apply CA-SCORE on several synthetic and real datasets and show that it has better performance than other community detection methods.
翻译:社区检测是处理网络数据时的一个重要问题。 在许多真实的数据集中, 相邻矩阵在某些节点上可能过于稀少, 无法获取任何社区信息。 共变矩阵显示支持社区检测。 然而, 如何结合共变是一个挑战, 因为共变与网络的分类标签可能具有高维度和不一致。 为了量化共变和网络之间的关系, 我们提议了一个通用模型, 叫做 共变相协助度校正区块模型( CA- DCSBM) 。 根据CA- DCSBM, 我们设计了经调整的邻居- 共变换( ANC) 数据矩阵, 利用共变信息协助社区检测。 我们随后证明, ANC 矩阵上的光谱组合方法将连接网络和共变标签。 由此得出的方法, 名为 CA- SCORE, 显示在温和条件下, 质属性。 特别是, 我们的架构可以覆盖具有挑战性的情景, 即社区信息矩阵没有社区信息, 或共变式矩阵有多个社区标签, 来帮助社区检测社区。 最后, 我们的合成CAS- 和GRE 显示其他的模型显示, 。