How to detect a small community in a large network is an interesting problem, including clique detection as a special case, where a naive degree-based $\chi^2$-test was shown to be powerful in the presence of an Erd\H{o}s-Renyi background. Using Sinkhorn's theorem, we show that the signal captured by the $\chi^2$-test may be a modeling artifact, and it may disappear once we replace the Erd\H{o}s-Renyi model by a broader network model. We show that the recent SgnQ test is more appropriate for such a setting. The test is optimal in detecting communities with sizes comparable to the whole network, but has never been studied for our setting, which is substantially different and more challenging. Using a degree-corrected block model (DCBM), we establish phase transitions of this testing problem concerning the size of the small community and the edge densities in small and large communities. When the size of the small community is larger than $\sqrt{n}$, the SgnQ test is optimal for it attains the computational lower bound (CLB), the information lower bound for methods allowing polynomial computation time. When the size of the small community is smaller than $\sqrt{n}$, we establish the parameter regime where the SgnQ test has full power and make some conjectures of the CLB. We also study the classical information lower bound (LB) and show that there is always a gap between the CLB and LB in our range of interest.
翻译:如何在大型网络中检测一个小社区是一个有趣的问题, 包括作为特例的分类检测 。 在这种特例中, 以天真的度为基础的 $\ chi<unk> 2$- 测试显示在Erd\H{ o}s- Renyi 的背景下, 测试力非常强大。 使用 Sinkhorn 的理论, 我们显示, 由 $\ <unk> 2$- 测试所捕捉的信号可能是一个建模工艺, 一旦我们用一个更大的网络模型取代 Erd\ H{ o}s- Renyi 模型, 它可能会消失。 我们显示, 最近的 SgnQ 测试更适合这样的设置。 在探测与整个网络相近的大小的社区时, 测试效果最优, 但从未为我们的设置。 使用 度修正的区块模型( DCB), 我们建立这个测试问题的阶段过渡, 有关小社区的规模和大社区的边缘。 当小社区之间的距离大于 $\\ n$, SgnQ 测试是最佳的缩测试方法, 。</s>