Randomized Controlled Trials (RCTs) are often considered the gold standard for estimating causal effect, but they may lack external validity when the population eligible to the RCT is substantially different from the target population. Having at hand a sample of the target population of interest allows us to generalize the causal effect. Identifying the treatment effect in the target population requires covariates to capture all treatment effect modifiers that are shifted between the two sets. Standard estimators then use either weighting (IPSW), outcome modeling (G-formula), or combine the two in doubly robust approaches (AIPSW). However such covariates are often not available in both sets. In this paper, after proving L1-consistency of these three estimators, we compute the expected bias induced by a missing covariate, assuming a Gaussian distribution, a continuous outcome, and a semi-parametric model. Under this setting, we perform a sensitivity analysis for each missing covariate pattern and compute the sign of the expected bias. We also show that there is no gain in linearly imputing a partially-unobserved covariate. Finally we study the substitution of a missing covariate by a proxy. We illustrate all these results on simulations, as well as semi-synthetic benchmarks using data from the Tennessee Student/Teacher Achievement Ratio (STAR), and a real-world example from critical care medicine.
翻译:随机控制试验(RCTs)通常被视为用于估计因果关系的黄金标准,但当符合RCT资格的人口与目标人口大不相同时,这些试验可能缺乏外部有效性。当符合RCT资格的人口与目标人口大不相同时,通过对目标人口进行抽样调查,我们可以概括因果关系效应。确定目标人口的治疗效果需要共变法来捕捉在两组之间转移的所有治疗效果改变。标准估计者然后使用加权(IPSW)、结果模型(G-Formula)或将两种双倍稳健方法结合起来(AIPSW),但这两种方法通常都没有这种变量。但在本文中,在证明这三个估计对象的L1一致性之后,我们计算出一个缺失的共变法引起的预期偏差,假设高斯分布、连续结果和半参数模型。在此背景下,我们对每一个缺失的变差模式进行敏感度分析,并翻译预期偏差的迹象。我们还表明,在线性调整T的内断分数中,没有从一个部分未加固的正值的正反比值中获取。最后,我们用这些变差的正的正的代数数据,我们用这些变数数据作为代数的正的正的代数,我们用这些变数的代数的代数的代数,用这些变数数据作为正的代数的代数的代。