Causal inference from observational data can be viewed as a missing data problem arising from a hypothetical population-scale randomized trial matched to the observational study. This links a target trial protocol with a corresponding generative predictive model for inference, providing a complete framework for transparent communication of causal assumptions and statistical uncertainty on treatment effects, without the need for counterfactuals. The intuitive foundation for the work is that a whole population randomized trial would provide answers to any observable causal question with certainty. Thus, our fundamental problem of causal inference is the missingness of the hypothetical target trial data, which we solve through repeated imputation from a generative predictive model conditioned on the observational data. Causal assumptions map to intuitive conditions on the transportability of predictive models across populations and conditions. We demonstrate our approach on a real data application to studying the effects of maternal smoking on birthweights using extensions of Bayesian additive regression trees and inverse probability weighting.
翻译:从观察数据得出的因果推断可被视为与观察研究相匹配的假设人口规模随机试验引起的缺失数据问题。这把目标试验协议与相应的遗传预测模型联系起来,为透明地通报因果关系假设和治疗效果统计不确定性提供一个完整的框架,而不必反事实。工作的直觉基础是,整个人口随机试验将明确回答任何可观察到的因果问题。因此,我们因果推断的根本问题是假设目标试验数据的缺失,我们通过以观察数据为条件的基因预测模型反复估算,解决了这些数据。关于预测模型跨人口和条件的可传输性的不直观假设图。我们展示了我们采用实际数据应用的方法,利用Bayesian累加回归树的延伸和反概率加权法研究孕产妇吸烟对出生体重的影响。