The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to stochastic model of networks and semidefinite program research.
翻译:随机随机图形模型,用于在网络结构数据上进行集群和社区探测。经过数十年的广泛研究,已经取得了许多深刻的成果,其中从数学角度和应用角度来看,Kesten-Stigum临界值的阶段过渡特别有趣。它指出,如果模型参数低于某些临界值,任何基于网络地形的估算器都无法比稀薄图形的概率高得多。然而,如果我们稍微将地平线扩展至无处不在的半监督设置,这种基本限制将完全消失。我们证明,随着标签的任意部分暴露,检测问题在整个参数领域都是可行的。此外,我们引入了两种高效的算法,一种组合法和一种基于优化的算法,将标签信息与图形结构结合起来。我们的工作为网络的随机模型和半确定程序研究带来了新的视角。