With increasing data availability, causal treatment effects can be evaluated across different datasets, both randomized controlled trials (RCTs) and observational studies. RCTs isolate the effect of the treatment from that of unwanted (confounding) co-occurring effects. But they may struggle with inclusion biases, and thus lack external validity. On the other hand, large observational samples are often more representative of the target population but can conflate confounding effects with the treatment of interest. In this paper, we review the growing literature on methods for causal inference on combined RCTs and observational studies, striving for the best of both worlds. We first discuss identification and estimation methods that improve generalizability of RCTs using the representativeness of observational data. Classical estimators include weighting, difference between conditional outcome models, and doubly robust estimators. We then discuss methods that combine RCTs and observational data to improve (conditional) average treatment effect estimation, handling possible unmeasured confounding in the observational data. We also connect and contrast works developed in both the potential outcomes framework and the structural causal model framework. Finally, we compare the main methods using a simulation study and real world data to analyze the effect of tranexamic acid on the mortality rate in major trauma patients. Code to implement many of the methods is provided.
翻译:随着数据提供量的增加,可以对不同数据集的因果关系处理效果进行评估,包括随机控制试验和观察研究,以及观察研究。RCT将治疗的效果与不想要的(固定)共同作用的效果分开,但是它们可能会与包容偏差作斗争,因而缺乏外部有效性。另一方面,大量的观测样本往往更能代表目标人群,但可以将混杂效应与兴趣的处理结合起来。在本文中,我们审查关于综合RCT和观察研究的因果关系推断方法的文献不断增长,力求实现两个世界的最佳。我们首先讨论利用观测数据的代表性来提高RCT的通用性的方法。典型的估算包括加权、有条件结果模型之间的差异和双重强度估计。我们然后讨论将RCT和观测数据结合起来的方法,以改进(有条件的)平均治疗效果估计,处理观测数据中可能无法计量的混杂现象。我们还首先讨论确定和比较在潜在结果框架和结构性创伤后诊断分析中开发的RCT的可比较方法。最后,我们用真实的模型和结构性创伤后期分析方法将世界病员死亡率模型与许多主要模型分析方法进行比较。