Causal inference for network data is an area of active interest in the social sciences. Unfortunately, the complicated dependence structure of network data presents an obstacle to many causal inference procedures. We consider the task of mediation analysis for network data, and present a model in which mediation occurs in a latent embedding space. Under this model, node-level interventions have causal effects on nodal outcomes, and these effects can be partitioned into a direct effect independent of the network, and an indirect effect induced by homophily. To estimate network-mediated effects, we embed nodes into a low-dimensional space and fit two regression models: (1) an outcome model describing how nodal outcomes vary with treatment, controls, and position in latent space; and (2) a mediator model describing how latent positions vary with treatment and controls. We prove that the estimated coefficients are asymptotically normal about the true coefficients under a sub-gamma generalization of the random dot product graph, a widely-used latent space model. We show that these coefficients can be used in product-of-coefficients estimators for causal inference. Our method is easy to implement, scales to networks with millions of edges, and can be extended to accommodate a variety of structured data.
翻译:网络数据的因果推断是社会科学领域中的一个热门研究领域。然而,网络数据的复杂依赖结构对许多因果推断程序构成了障碍。我们考虑网络数据的介导分析任务,并提出了一种模型,其中介导在一个潜在的嵌入空间中发生。在这个模型下,节点级别的干预对节点结果产生因果效应,这些效应可以分为独立于网络的直接效应和由同质性引起的间接效应。为了估计网络介导效应,我们将节点嵌入到一个低维空间中,并拟合两个回归模型:(1)结果模型描述了节点结果如何随着处理、控制变量和潜在空间中的位置而变化;(2)介入者模型描述了潜在位置如何随处理和控制变量而变化。我们证明,估计的系数在“随机点积图”的次Gamma推广下相对于真实系数是渐近正常的,这是一种广泛使用的潜在空间模型。我们展示了这些系数可以用于乘积系数估算器进行因果推断。我们的方法易于实现,适用于拥有数百万条边的网络,并可扩展以适应各种结构化数据。