Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this paper, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study community detection in graphs obeying the stochastic block model and censored block model with auxiliary latent variables. We analyze the conditions for exact recovery when these auxiliary latent variables are unknown, representing unknown nuisance parameters or model mismatch. We also analyze exact recovery when these secondary latent variables have been either fully or partially revealed. Finally, we propose a semidefinite programming algorithm for recovering the desired labels when the secondary labels are either known or unknown. We show that exact recovery is possible by semidefinite programming down to the respective maximum likelihood exact recovery threshold.
翻译:经验观测表明,在实践中,社区成员并不完全解释观测图边缘之间的依赖性。 图形边缘的剩余依赖性在本文中先用辅助节点潜在变量进行模型化,这些变量影响图形边缘的统计,但没有关于相关社区的信息。 然后我们用符合随机区块模型的图表和带有辅助潜伏变量的受审查区块模型来研究社区检测。 我们分析这些辅助潜在变量未知时的确切恢复条件, 代表未知的扰动参数或模型不匹配。 我们还分析这些次要潜在变量完全或部分被披露时的确切恢复情况。 最后, 我们提出在次级标签为已知或未知时恢复所需标签的半确定性程序算法。 我们表明,通过半确定程序到相应的最大可能性准确恢复阈值, 精确恢复是可能的。