Networks continue to be of great interest to statisticians, with an emphasis on community detection. Less work, however, has addressed this question: given some network, does it exhibit meaningful community structure? We propose to answer this question in a principled manner by framing it as a statistical hypothesis in terms of a formal and general parameter related to homophily. Homophily is a well-studied network property where intra-community edges are more likely than between-community edges. We use the metric to identify and distinguish between three concepts: nominal, collateral, and intrinsic homophily. We propose a simple and interpretable test statistic leveraging this parameter and formulate both asymptotic and bootstrap-based rejection thresholds. We prove its asymptotic properties and demonstrate it outperforms benchmark methods on both simulated and real world data. Furthermore, the proposed method yields rich, provocative insights on four classic data sets; namely, that meany well-studied networks do not actually have intrinsic homophily.
翻译:统计人员对网络仍然非常感兴趣,重点是社区探测。然而,较少的工作涉及这一问题:如果有某种网络,它是否具有有意义的社区结构?我们提议以有原则的方式回答这个问题,把它作为与同质有关的正式和一般参数的统计假设。同性是经过充分研究的网络属性,社区内部边缘比社区间边缘更有可能。我们使用该衡量标准来识别和区分三个概念:名义、抵押和内在同质。我们提出利用这一参数的简单和可解释的测试统计,并拟订无症状和靴式拒绝阈值。我们证明它无症状特性,并证明它优于模拟和真实世界数据的基准方法。此外,拟议方法在四种经典数据集上产生了丰富的、挑衅性的洞察力;也就是说,经过认真研究的网络实际上没有内在的同质。