The stochastic block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. It has received great attention in the last decade and the balanced case, i.e., assuming all clusters have large size, has been well studied. However, our understanding of SBM with unbalanced communities (arguably, more relevant in practice) is still very limited. In this paper, we provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes. We improve upon a result of Ailon, Chen and Xu [ICML 2013] by removing the assumption that there is a large interval such that the sizes of clusters do not fall in. Under the planted clique conjecture, the size of the clusters that can be recovered by our algorithm is nearly optimal (up to polylogarithmic factors) when the probability parameters are constant. As a byproduct, we obtain a polynomial-time algorithm with sublinear query complexity for a clustering problem with a faulty oracle, which finds all clusters of size larger than $\tilde{\Omega}({\sqrt{n}})$ even if $\Omega(n)$ small clusters co-exist in the graph. In contrast, all the previous efficient algorithms that makes sublinear number of queries cannot recover any large cluster, if there are more than $\tilde{\Omega}(n^{2/5})$ small clusters.
翻译:光学区块模型(SBM)是研究图集或网络社区探测的基本模型,在过去十年中受到极大关注,平衡的假设,即假设所有组群规模大,已经研究过。然而,我们对于有不平衡社区(在实际中可能更相关)的SBM的理解仍然非常有限。在本文中,我们提供了一种简单的基于SVD的算法,用于利用不同规模的社区恢复SBM社区。在Ailon、Chen和Xu[ICML 2013]之后,我们得到了改进,我们取消了这样的假设,即:存在着一个大的间隔,即所有组群群的大小不会缩小。在所配置的分区曲线下,当概率参数保持不变时,我们可以通过我们的算法回收的组群的规模几乎是最佳的(最高至于多元性系数)。作为副产品,我们得到了一个具有断层或断层的多时算法,发现所有组群体大小都大于$=O5美元,如果前数组组群体小的组群体不能恢复。