Networks continue to be of great interest to statisticians, with an emphasis on community detection. Less work, however, has addressed this question: given some network, does it exhibit meaningful community structure? We propose to answer this question in a principled manner by framing it as a statistical hypothesis in terms of a formal and general homophily metric. Homophily is a well-studied network property where intra-community edges are more likely than between-community edges. We use the homophily metric to identify and distinguish between three concepts: nominal, collateral, and intrinsic homophily. We propose a simple and interpretable test statistic leveraging this homophily parameter and formulate both asymptotic and bootstrap-based rejection thresholds. We prove its asymptotic properties and demonstrate it outperforms benchmark methods on both simulated and real world data. Furthermore, the proposed method yields rich, provocative insights on four classic data sets; namely, that meany well-studied networks do not actually have intrinsic homophily.
翻译:统计人员对网络仍然非常感兴趣,重点是社区探测。然而,较少的工作涉及这一问题:如果有某种网络,它是否具有有意义的社区结构?我们提议以有原则的方式回答这个问题,将其描述为正式和一般的同质度统计假设。单调是一种研究周密的网络属性,社区内部边缘比社区边缘更有可能。我们使用同质度量来识别和区分三个概念:名义、抵押和内在的同质。我们提出利用这一同质参数的简单和可解释的测试统计,并制订无症状和靴子拒绝阈值。我们证明它无症状特性,并证明它优于模拟和真实世界数据的基准方法。此外,拟议方法在四种典型数据集上产生丰富、挑衅性的洞察力;即,暗中和研究周密的网络实际上没有内在的同质性。