Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world data sets.
翻译:研究人员认为,许多真实世界的网络展示了社区内部边缘比社区间边缘更有可能的社区结构。虽然存在将节点分组到不同社区的多种方法,但处理该问题的工作较少:考虑到某些网络,它是否展示了具有统计意义的社区结构?我们以有原则的方式回答这个问题,把它作为一般和模型学社区结构参数的统计假设测试。利用这一参数,我们提出一个简单和可解释的测试统计,用于制定两个不同的假设测试框架。第一个是针对参数的基线值的无症状测试,第二个是使用靴子阈值对基线模型的第二次测试。我们证明了这些测试的理论特性,并展示了拟议方法如何为现实世界数据集带来丰富的洞察力。