Since network data commonly consists of observations from a single large network, researchers often partition the network into clusters in order to apply cluster-robust inference methods. Existing such methods require clusters to be asymptotically independent. Under mild conditions, we prove that, for this requirement to hold for network-dependent data, it is necessary and sufficient that clusters have low conductance, the ratio of edge boundary size to volume. This yields a simple measure of cluster quality. We find in simulations that when clusters have low conductance, cluster-robust methods control size better than HAC estimators. However, for important classes of networks lacking low-conductance clusters, the former can exhibit substantial size distortion. To determine the number of low-conductance clusters and construct them, we draw on results in spectral graph theory that connect conductance to the spectrum of the graph Laplacian. Based on these results, we propose to use the spectrum to determine the number of low-conductance clusters and spectral clustering to construct them.
翻译:由于网络数据通常由单一大型网络的观测组成,研究人员往往将网络分成组群,以便采用集束-紫外线推断方法;现有的这类方法要求各组群无差别地独立。在温和的条件下,我们证明,为了保持这种对依赖网络的数据的要求,必须而且足以使各组群具有低导力、边缘边界大小与体积之比的低导力、集束-紫外线方法控制大小比高频天体估计器高的模拟数据组群。然而,对于缺少低导力组群的重要网络类别而言,前者可显示出巨大的体积扭曲。为了确定低导力组群的数量并构建这些数据,我们利用光谱图理论的结果,将导力与拉普拉提亚图的频谱联系起来。根据这些结果,我们建议利用光谱来确定低导力组群和光谱组群的数量,以建造这些数据组群。