Assessing homophily in large-scale networks is central to understanding structural regularities in graphs, and thus inform the choice of models (such as graph neural networks) adopted to learn from network data. Evaluation of smoothness metrics requires access to the entire network topology and node features, which may be impractical in several large-scale, dynamic, resource-limited, or privacy-constrained settings. In this work, we propose a sampling-based framework to estimate homophily via the Dirichlet energy (Laplacian-based total variation) of graph signals, leveraging the Horvitz-Thompson (HT) estimator for unbiased inference from partial graph observations. The Dirichlet energy is a so-termed total (of squared nodal feature deviations) over graph edges; hence, estimable under general network sampling designs for which edge-inclusion probabilities can be analytically derived and used as weights in the proposed HT estimator. We establish that the Dirichlet energy can be consistently estimated from sampled graphs, and empirically study other heterophily measures as well. Experiments on several heterophilic benchmark datasets demonstrate the effectiveness of the proposed HT estimators in reliably capturing homophilic structure (or lack thereof) from sampled network measurements.
翻译:评估大规模网络的同质性对于理解图结构规律至关重要,从而为从网络数据中学习所采用的模型(如图神经网络)选择提供依据。平滑性度量的评估需要获取完整的网络拓扑和节点特征,这在许多大规模、动态、资源受限或隐私受限的场景中可能不切实际。在本工作中,我们提出一种基于采样的框架,通过图信号的Dirichlet能量(基于拉普拉斯算子的总变差)来估计同质性,并利用Horvitz-Thompson(HT)估计器从部分图观测中进行无偏推断。Dirichlet能量是图边上节点特征偏差平方的总和;因此,在一般的网络采样设计下可进行估计,其中边包含概率可解析推导并用作所提HT估计器中的权重。我们证明了Dirichlet能量可从采样图中一致估计,并实证研究了其他异质性度量。在多个异质性基准数据集上的实验表明,所提出的HT估计器能有效从采样网络测量中可靠捕捉同质性结构(或其缺失情况)。