In network analysis, how to estimate the number of communities $K$ is a fundamental problem. We consider a broad setting where we allow severe degree heterogeneity and a wide range of sparsity levels, and propose Stepwise Goodness-of-Fit (StGoF) as a new approach. This is a stepwise algorithm, where for $m = 1, 2, \ldots$, we alternately use a community detection step and a goodness-of-fit (GoF) step. We adapt SCORE \cite{SCORE} for community detection, and propose a new GoF metric. We show that at step $m$, the GoF metric diverges to $\infty$ in probability for all $m < K$ and converges to $N(0,1)$ if $m = K$. This gives rise to a consistent estimate for $K$. Also, we discover the right way to define the signal-to-noise ratio (SNR) for our problem and show that consistent estimates for $K$ do not exist if $\mathrm{SNR} \goto 0$, and StGoF is uniformly consistent for $K$ if $\mathrm{SNR} \goto \infty$. Therefore, StGoF achieves the optimal phase transition. Similar stepwise methods (e.g., \cite{wang2017likelihood, ma2018determining}) are known to face analytical challenges. We overcome the challenges by using a different stepwise scheme in StGoF and by deriving sharp results that are not available before. The key to our analysis is to show that SCORE has the {\it Non-Splitting Property (NSP)}. Primarily due to a non-tractable rotation of eigenvectors dictated by the Davis-Kahan $\sin(\theta)$ theorem, the NSP is non-trivial to prove and requires new techniques we develop.
翻译:在网络分析中, 如何估算社区数量 $K$是一个根本性问题。 我们考虑一个宽广的设置, 允许严重程度的异质性和广泛的聚度水平, 并提议将 Stepwith Goodness( StGoF) 作为一种新方法。 这是一个渐进式算法, $= 1, 2,\ ldots 美元, 我们轮流使用一个社区检测步骤和适合的步数。 我们为社区检测而调整 SCORE\ cite{ SCORE} 的 SWITE} 挑战, 并提议一个新的 GOF 度。 我们显示, 在步骤 $ $ 和 美元之间, GOGER 差到 美元之间的概率是$ 0. 1, 1美元 美元 美元= K美元。 这可以得出一个一致的估算 。 此外, 我们发现一个正确的方法来定义我们问题的信号- 和正态比率 (SNRR) (SNR) (SNR), 并且显示, 如果 美元不是 美元, 直方 直方,, 直方= 直方= 直方= 直方= 直方= 直方 分析, 直方= 直方 将显示, 直方= 直方 将显示, 直方= 直方= 直方 直方= 直方= 直方 将显示, 直方 将显示, 直方 。