In many data scientific problems, we are interested in inferring causal relationships in the data generating mechanism. Here, we consider the following real-world question: how has the Colombian conflict influenced tropical forest loss? There is evidence for both enhancing and reducing impacts. Answering such questions requires the use of causal models. In this work, we propose a class of causal models for spatio-temporal stochastic processes. It allows us to formally define and quantify the causal effect of a vector of covariates $X$ on a real-valued response $Y$, even if the causal background knowledge is incomplete. We introduce a procedure for estimating causal effects, and a non-parametric hypothesis test for these effects being zero. The proposed methods do not make strong distributional assumptions, and allow for arbitrarily many latent confounders, given that these confounders do not vary across time (or, alternatively, they do not vary across space). When applying our causal methodology to the problem of conflict and forest loss, using data from 2000 to 2018, we find a reducing but insignificant causal effect of conflict on forest loss. Regionally, both enhancing and reducing effects can be identified. Our theoretical findings are supported by simulations, and code is available online.
翻译:在许多数据科学问题中,我们有兴趣在数据生成机制中推断因果关系。这里,我们考虑以下现实世界问题:哥伦比亚冲突如何影响热带森林损失?有证据可以加强和减少影响。回答这些问题需要使用因果模型。在这项工作中,我们建议了一组因果模型,用于时空随机分析过程。它允许我们正式界定和量化因果因果因果因子的因果效应,因果介于实际价值为$Y,即使因果背景知识不完整。我们引入了估计因果影响的程序,以及对这些影响进行非参数假设测试为零。建议的方法不提供强有力的分布假设,而允许任意的许多潜在相交者使用,因为这些混杂者不会在时间上变化(或者说,它们不会在空间上变化)。在对冲突和森林损失问题应用我们的因果方法时,我们使用2000年至2018年的数据,我们发现冲突对森林损失的因果影响减少,但却微不足道。从区域角度来说,加强和减少影响都是可以在线得到模拟支持的。