Due to the technological progress of the last decades, Community Detection has become a major topic in machine learning. However, there is still a huge gap between practical and theoretical results, as theoretically optimal procedures often lack a feasible implementation and vice versa. This paper aims to close this gap and presents a novel algorithm that is both numerically and statistically efficient. Our procedure uses a test of homogeneity to compute adaptive weights describing local communities. The approach was inspired by the Adaptive Weights Community Detection (AWCD) algorithm by Adamyan et al. (2019). This algorithm delivered some promising results on artificial and real-life data, but our theoretical analysis reveals its performance to be suboptimal on a stochastic block model. In particular, the involved estimators are biased and the procedure does not work for sparse graphs. We propose significant modifications, addressing both shortcomings and achieving a nearly optimal rate of strong consistency on the stochastic block model. Our theoretical results are illustrated and validated by numerical experiments.
翻译:由于过去几十年的技术进步,社区探测已成为机器学习的一个主要主题,然而,实际和理论结果之间仍然存在着巨大的差距,因为理论上的最佳程序往往缺乏可行的实施,反之亦然。本文件旨在缩小这一差距,并提出了一种数字和统计两方面都有效的新算法。我们的程序使用同质性测试来计算描述当地社区的适应性加权数。这种方法受阿达米扬等人(2019年)的适应性 Weights社区探测算法(AWCD)的启发。这一算法在人工和现实生活数据方面产生了一些有希望的结果,但我们的理论分析表明,其性能在随机区块模型上是次优的。特别是,所涉及的估计者是有偏向的,程序对稀有的图不起作用。我们建议进行重大修改,既解决缺陷,又在随机区块模型上实现几乎最佳的高度一致性。我们的理论结果通过数字实验加以说明和验证。