Since network data commonly consists of observations on a single large network, researchers often partition the network into clusters in order to apply cluster-robust inference methods. All existing such methods require clusters to be asymptotically independent. We prove that for this requirement to hold, under certain conditions, it is necessary and sufficient for clusters to have low conductance, the ratio of edge boundary size to volume, which yields a measure of cluster quality. We show in simulations that, for important classes of networks lacking low-conductance clusters, cluster-robust methods can exhibit substantial size distortion, whereas for networks with such clusters, they outperform HAC estimators. To assess the existence of low-conductance clusters and construct them, we draw on results in spectral graph theory showing a close connection between conductance and the spectrum of the graph Laplacian. Based on these results, we propose to use the spectrum to compute the number of low-conductance clusters and spectral clustering to compute the clusters. We illustrate our results and proposed methods in simulations and empirical applications.
翻译:由于网络数据通常包含对单一大型网络的观测,研究人员往往将网络分成组群,以便应用集束-气压推断方法。所有现有的这类方法都要求各组群保持零星独立。我们证明,为了在一定条件下保持这一要求,必须而且足以使组群保持低导力,即边缘边界大小与体积之比,从而产生一定的集束质量。我们在模拟中显示,对于缺乏低导集束的重要类别的网络而言,集束-气旋方法可以显示出巨大的体积扭曲,而对于有这种组群的网络来说,它们优于HAC估计者。为了评估低导集群的存在并构建这些组群,我们从光谱图理论中得出结果,显示导力与色谱之间的密切联系。根据这些结果,我们提议利用频谱来计算低导集群和光谱集的数量,以对各组群集进行编译。我们介绍了我们在模拟和实验应用方面的结果和拟议的方法。