Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
翻译:翻译摘要:
近年来,处理空间混淆因素成为空间统计中的主要话题。然而,文献中提供的定义存在冲突,并且许多所提出的定义并不解决因果推断中所理解的混淆问题。我们将空间混淆解释为存在一种带有空间结构的未测定因果混淆因素。我们提出了一种因果推断框架,以非参数识别具有空间混淆因素下连续暴露因素对结果的因果效应。我们提出了双重机器学习(DML)的机制,即采用灵活的模型将混淆因素与暴露因素和结果变量一起回归,以获得具有有利的鲁棒性和收敛速率的因果估计器,并证明这种方法在空间依赖性下是一致的和渐近正常的。据我们所知,这是第一种将空间混淆因素处理无法依赖于参数(如线性性,效应均匀性或正态性)的识别和估计方法。我们在模拟和实际应用中展示了DML方法的优势。我们将我们的方法和推理应用于对美国加利福尼亚州孕期暴露于细颗粒物对新生儿出生体重的影响研究中。