如何和为何使用实验数据评价观察性因果关系推断方法 (How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference)

Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We describe and analyze observational sampling from randomized controlled trials (OSRCT), a method for evaluating causal inference methods using data from randomized controlled trials (RCTs). This method can be used to create constructed observational data sets with corresponding unbiased estimates of treatment effect, substantially increasing the number of data sets available for empirical evaluation of causal inference methods. We show that, in expectation, OSRCT creates data sets that are equivalent to those produced by randomly sampling from empirical data sets in which all potential outcomes are available. We then perform a large-scale evaluation of seven causal inference methods over 37 data sets, drawn from RCTs, as well as simulators, real-world computational systems, and observational data sets augmented with a synthetic response variable. We find notable performance differences when comparing across data from different sources, demonstrating the importance of using data from a variety of sources when evaluating any causal inference method.

翻译：从观察数据推断因果依赖的方法是许多科学领域的核心,包括医学、经济学和社会科学。这些方法的各种理论特性已经得到证明,但实证评价仍是一个挑战,主要原因是缺乏已知治疗效果的观察数据集。我们描述和分析随机控制试验(OSRCT)的观察抽样,这是使用随机控制试验数据评价因果推断方法的一种方法。这种方法可用来创建结构化的观察数据集,对治疗效果作出相应的不偏倚估计,大大增加可用于对因果关系推断方法进行实证评估的数据集的数量。我们表明,预期OSSRCT生成的数据集相当于通过随机抽样从所有潜在结果都可得到的经验数据集产生的数据集。我们随后对利用随机控制试验数据对37个数据集的七种因果推断方法进行大规模评估,并用模拟器、真实世界计算系统和观察数据集补充合成反应变量。我们发现,在比较不同来源的数据时,在使用各种来源的因果关系评估数据时,我们发现明显的绩效差异。