With increasing data availability, causal effects can be evaluated across different data sets, both randomized controlled trials (RCTs) and observational studies. RCTs isolate the effect of the treatment from that of unwanted (confounding) co-occurring effects but they may suffer from unrepresentativeness, and thus lack external validity. On the other hand, large observational samples are often more representative of the target population but can conflate confounding effects with the treatment of interest. In this paper, we review the growing literature on methods for causal inference on combined RCTs and observational studies, striving for the best of both worlds. We first discuss identification and estimation methods that improve generalizability of RCTs using the representativeness of observational data. Classical estimators include weighting, difference between conditional outcome models, and doubly robust estimators. We then discuss methods that combine RCTs and observational data to either ensure uncounfoundedness of the observational analysis or to improve (conditional) average treatment effect estimation. We also connect and contrast works developed in both the potential outcomes literature and the structural causal model literature. Finally, we compare the main methods using a simulation study and real world data to analyze the effect of tranexamic acid on the mortality rate in major trauma patients. A review of available codes and new implementations is also provided.
翻译:随着数据提供量的增加,可以对不同数据集的因果关系进行评估,包括随机控制试验和观察研究。RCT将治疗的效果与不想要的(固定的)共同作用的效果分开,但它们可能缺乏代表性,因而缺乏外部有效性。另一方面,大量的观测样本往往更能代表目标人群,但可以将同兴趣的处理混为一体。在本文中,我们审查关于综合RCT和观察研究的因果关系推断方法的文献不断增多,力求实现两个世界的最佳。我们首先讨论利用观测数据的代表性来提高RCT的可概括性的方法。分类估计数据包括加权、有条件结果模型之间的差异和加倍有力的估计数据。我们然后讨论将RCT和观察数据结合起来的方法,以确保观测分析没有根据,或改进(有条件的)平均治疗效果估计。我们还将潜在结果文献和结构性因果关系分析方法进行联系和比较。最后,我们用主要的数据模型分析了世界创伤程度,我们用新的数据模型分析了世界创伤后期研究。