Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of censored community detection with two communities, where most of the data is missing as the status of only a small fraction of the potential edges is revealed. In this model, vertices in the same community are connected with probability $p$ while vertices in opposite communities are connected with probability $q$. The connectivity status of a given pair of vertices $\{u,v\}$ is revealed with probability $\alpha$, independently across all pairs, where $\alpha = \frac{t \log(n)}{n}$. We establish the information-theoretic threshold $t_c(p,q)$, such that no algorithm succeeds in recovering the communities exactly when $t < t_c(p,q)$. We show that when $t > t_c(p,q)$, a simple spectral algorithm based on a weighted, signed adjacency matrix succeeds in recovering the communities exactly. While spectral algorithms are shown to have near-optimal performance in the symmetric case, we show that they may fail in the asymmetric case where the connection probabilities inside the two communities are allowed to be different. In particular, we show the existence of a parameter regime where a simple two-phase algorithm succeeds but any algorithm based on thresholding a linear combination of the top two eigenvectors of the weighted, signed adjacency matrix fails.
翻译:社区检测是图形中社区结构的识别问题。 通常, 图形是来自Stochastic Block 模型的样本, 每个顶端都属于一个社区。 两个顶端连接到两个顶端的概率取决于这些顶端的社区。 在本文中, 我们考虑用两个社区来检测受检查的社区模式, 大部分数据都缺少, 仅仅是潜在边缘中一小部分的状态被披露。 在这个模型中, 同一个社区的顶端与概率挂钩, 概率为$p$, 而相反社区中的顶端与概率为$q美元。 给定的顶端的顶端是$_ u, vv$ 的连接状态取决于这些顶端。 在所有对口中, $\alpha=\ farac{ t\log( n)\\\\\n}} $。 我们建立信息- 直位值阈值的顶端值 $t_ c( p) 基数值为$ t_ cq), 直径( t) 直径( sq) 直径直径的直径的直径直径的直径直径方的直径值社区, 直径的直方值的直方位值无法。 我们显示的直径直径方的直方的直方位的直方位的直方位的直方位值在正方位的直方位值中, 直方位的直方位的直方位的直方位的直方位( 。 我们方位的直方位的直方位的直方位的直方位次方位次方位表示显示的直方位值。