非实验性数据中的数以百万计的随机治疗能够揭示出什么原因? (What can the millions of random treatments in nonexperimental data reveal about causes?)

We propose a new method to estimate causal effects from nonexperimental data. Each pair of sample units is first associated with a stochastic 'treatment' - differences in factors between units - and an effect - a resultant outcome difference. It is then proposed that all such pairs can be combined to provide more accurate estimates of causal effects in observational data, provided a statistical model connecting combinatorial properties of treatments to the accuracy and unbiasedness of their effects. The article introduces one such model and a Bayesian approach to combine the $O(n^2)$ pairwise observations typically available in nonexperimnetal data. This also leads to an interpretation of nonexperimental datasets as incomplete, or noisy, versions of ideal factorial experimental designs. This approach to causal effect estimation has several advantages: (1) it expands the number of observations, converting thousands of individuals into millions of observational treatments; (2) starting with treatments closest to the experimental ideal, it identifies noncausal variables that can be ignored in the future, making estimation easier in each subsequent iteration while departing minimally from experiment-like conditions; (3) it recovers individual causal effects in heterogeneous populations. We evaluate the method in simulations and the National Supported Work (NSW) program, an intensively studied program whose effects are known from randomized field experiments. We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample with the entire national program data, outperforming Statistical, Econometrics and Machine Learning estimators in all cases...

翻译：我们提出一个新的方法来估计非实验性数据造成的因果关系。每对抽样单位首先与随机的“处理”相关—— 单位之间的差异—— 以及效果—— 由此产生的结果差异。然后,我们提议,所有这些对的组合可以结合,以提供观察数据中更准确的因果关系估计,提供了将治疗的组合特性与其效果的准确性和公正性联系起来的统计模型。文章引入了一种这种模型和巴耶斯式的方法,将非实验性数据中通常可获得的$O(n)2美元双向观测结合起来。这还导致将非实验性数据集解释为理想因素实验性实验设计的不完整或杂音版本。这种对因果关系估计方法有若干好处:(1) 它扩大了观察数量,将数千人转换成数百万次观察性治疗;(2) 从最接近实验性理想的治疗开始,它确定了未来可以忽略的所有非实验性变量,使以后的每个观察性观察都更容易进行估算,同时从类似实验性条件中最低限度地评估;(3) 它将非实验性数据集解释为不完全性的国家实验性方案效果,我们所研究的实地实验中的一项常规性实验性实验性评估了一种常规性结果。