Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of {\em censored} community detection with two communities, where most of the data is missing as the status of only a small fraction of the potential edges is revealed. In this model, vertices in the same community are connected with probability $p$ while vertices in opposite communities are connected with probability $q$. The connectivity status of a given pair of vertices $\{u,v\}$ is revealed with probability $\alpha$, independently across all pairs, where $\alpha = \frac{t \log(n)}{n}$. We establish the information-theoretic threshold $t_c(p,q)$, such that no algorithm succeeds in recovering the communities exactly when $t < t_c(p,q)$. We show that when $t > t_c(p,q)$, a simple spectral algorithm based on a weighted, signed adjacency matrix succeeds in recovering the communities exactly. While spectral algorithms are shown to have near-optimal performance in the symmetric case, we show that they may fail in the asymmetric case where the connection probabilities inside the two communities are allowed to be different. In particular, we show the existence of a parameter regime where a simple two-phase algorithm succeeds but any algorithm based on the top two eigenvectors of the weighted, signed adjacency matrix fails.
翻译:社区检测是在图形中识别社区结构的问题。 通常, 图形是来自Stochastic Block 模型的样本, 每个顶端都属于一个社区。 两个顶端连接的概率取决于这些顶端的群落。 在本文中, 我们考虑两个社区社区社区社区检测的模型, 大多数数据都缺少, 只有一小部分潜在边缘的状态被披露。 在这个模型中, 同一社区的顶端与概率挂钩, 而另一个社区的顶端与概率挂钩 $p$, 而相反社区的顶端与概率挂钩 $q$。 给定的顶端的顶端由边缘连接取决于这些顶端的群落的概率 $ ⁇ u, v ⁇ $。 在所有对端中, $alpha =\ frac{ t\ log( n)\\\ log} $。 我们建立信息- 理论阈值起始点 $t_ c. 允许存在, 等我们无法恢复社区, 当 $t 位数 直径的直径 直径 直径直径直径直径的直径直径的直径直径直径直径直径, 直径直径的直径直方的直径的直径直径直径运行状态显示, 。 。 直径直径直径直方在正方的直方格 直径直径直径直方的直方的直方的直径直径直径直径直方, 。