In network inference applications, it is often desirable to detect community structure, namely to cluster vertices into groups, or blocks, according to some measure of similarity. Beyond mere adjacency matrices, many real networks also involve vertex covariates that carry key information about underlying block structure in graphs. To assess the effects of such covariates on block recovery, we present a comparative analysis of two model-based spectral algorithms for clustering vertices in stochastic blockmodel graphs with vertex covariates. The first algorithm uses only the adjacency matrix, and directly estimates the induced block assignments. The second algorithm incorporates both the adjacency matrix and the vertex covariates into the estimation of block assignments, and moreover quantifies the explicit impact of the vertex covariates on the resulting estimate of the block assignments. We employ Chernoff information to analytically compare the algorithms' performance and derive the Chernoff ratio for certain models of interest. Analytic results and simulations suggest that the second algorithm is often preferred: we can often better estimate the induced block assignments by first estimating the effect of vertex covariates. In addition, real data examples on diffusion MRI connectome datasets and social network datasets also indicate that the second algorithm has the advantages of revealing underlying block structure and taking observed vertex heterogeneity into account in real applications. Our findings emphasize the importance of distinguishing between observed and unobserved factors that can affect block structure in graphs.
翻译:在网络推导应用中,通常有必要根据某种相似度的量度来检测社区结构,即将脊椎分组成群或区块。除了相邻矩阵外,许多真实的网络还包含带有图中块结构基础的关键信息的顶点共变变量。为了评估这种顶点共变对区块恢复的影响,我们对两种基于模型的光谱算法进行了比较分析,这些算法用于在具有顶点共差的区块模型图中组合脊椎。第一个算法只使用相邻矩阵,直接估计区块任务。第二个算法不仅包括相邻矩阵和顶点共差变量,而且还包括了对区块任务基础结构进行估计的关键信息。为了评估这种共差差对区块复原的影响,我们使用Cernoff 信息进行分析比较算法的性能,并得出某些兴趣模型的Chernnoff比率。分析结果和模拟表明,第二个算法往往更倾向于采用以下两种不同的算法:我们往往可以更好地估计相邻矩阵矩阵矩阵的矩阵矩阵矩阵和顶点对区段分配矩阵的大小,在估计区块分配的矩阵中,首先通过估算数据结果将数据连接数据连接数据连接数据分析,然后又将数据连接数据结构进行数据分析。