评估因果推断方法 (Evaluating Causal Inference Methods)

The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no `one-size-fits-all' causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework's novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.

翻译：得出因果关系推论的根本挑战是,任何单位都没有充分遵守反事实结果。此外,在观察研究中,处理任务很可能是混乱的。许多统计方法在未经证实的情况下出现,因为预处理共变,包括偏差计分法、预知分数法和双重稳健方法。不幸的是,对于应用的研究人员来说,没有“一刀切”的因果方法,可以实现最佳的普遍效果。在实践中,主要通过手工制作的模拟数据对因果方法进行定量评估。这类数据生成程序的价值有限,因为它们通常是对现实的典型模型。这些方法简化了可感知性,缺乏真实世界数据的复杂性。对于应用研究人员来说,关键是要了解手头数据的表现方法有多好。对于应用的研究人员来说,我们的工作引入了一个深厚的基于基因化模型框架,Cridge,以验证因果关系方法。这个框架的新颖之处在于它能够产生以实证方式分发的合成数据。这种数据生成的广泛性能可能有限,因为它们通常是对事实的典型模型模型的模型模型模型模型模型模型,因此也能够对真实性进行精确评估,因此,因此可以将数据的精确地评估。