The simultaneous availability of experimental and observational data to estimate a treatment effect is both an opportunity and a statistical challenge: Combining the information gathered from both data is a promising avenue to build upon the internal validity of randomized controlled trials (RCTs) and a greater external validity of observational data, but it raises methodological issues, especially due to different sampling designs inducing distributional shifts. We focus on the aim of transporting a causal effect estimated on an RCT onto a target population described by a set of covariates. Available methods such as inverse propensity weighting are not designed to handle missing values, which are however common in both data. In addition to coupling the assumptions for causal identifiability and for the missing values mechanism and to defining appropriate strategies, one has to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We study different approaches and their underlying assumptions on the data generating processes and distribution of missing values and suggest several adapted methods, in particular multiple imputation strategies. These methods are assessed in an extensive simulation study and practical guidelines are provided for different scenarios. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and a multi-centered RCT studying the effect of tranexamic acid administration on mortality. The analysis illustrates how the missing values handling can impact the conclusion about the effect transported from the RCT to the target population.
翻译:同时提供实验和观察数据以估计治疗效果既是机会,也是统计挑战:将从两个数据中收集的信息结合起来,是建立在随机控制试验(RCTs)的内部有效性和观察数据更大外部有效性的有利途径,但提出方法问题,特别是由于不同的抽样设计导致分布变化,我们着重将一个RCT的因果关系估计结果传送到一组共变数描述的目标人群中,现有的方法,例如反偏向权重,不是为了处理缺失值,而这两种数据都是常见的。除了将因果可识别性和缺失值机制的假设结合起来,以及确定适当战略之外,还必须考虑数据的具体结构,有两个来源,治疗和结果只能在RCT中找到。我们研究关于数据生成过程和缺失值分布的不同方法及其基本假设,特别是多种估算战略。这些方法在广泛的模拟研究中加以评估,并为不同情景提供了实用指南。这项工作的动机是,除了结合对因果可辨和缺失值机制的假设的假设,还要考虑数据的具体结构,只有两个来源和治疗结果才在RCT中出现。我们研究关于数据生成20 000以上主要创伤患者和死亡的大规模管理影响的分析。