While a randomized controlled trial (RCT) readily measures the average treatment effect (ATE), this measure may need to be shifted to generalize to a different population. Standard estimators of the target population treatment effect are based on the distributional shift in covariates, using inverse propensity sampling weighting (IPSW) or modeling response with the g-formula. However, these need covariates that are available both in the RCT and in an observational sample, which often qualifies very few of them. Here we analyze how the classic estimators behave when covariates are missing in at least one of the two datasets - RCT or observational. In line with general identifiability conditions, these estimators are consistent when including only treatment effect modifiers that are shifted in the target population. We compute the expected bias induced by a missing covariate, assuming Gaussian covariates and a linear model for the conditional ATE function. This enables sensitivity analysis for each missing covariate pattern. In addition, this method is particularly useful as it gives the sign of the expected bias. We also show that there is no gain imputing a partially-unobserved covariate. Finally we study the replacement of a missing covariate by a proxy, and the impact of imputation. We illustrate all these results on simulations, as well as semi-synthetic benchmarks using data from the Tennessee Student/Teacher Achievement Ratio (STAR), and with a real-world example from the critical care medical domain.
翻译:虽然随机控制试验(RCT)可以很容易地测量平均治疗效果(ATE),但这一措施可能需要转换为向不同的人群普及。目标人口治疗效果的标准估计者基于共变分布变化,使用反向偏差抽样加权(IPSW)或与 g 公式建模反应。然而,这些都需要在RCT 和观察样本中都有的共变变量,这些变量往往符合非常少的参数。这里我们分析传统的估计者在至少两个数据集之一( RCT 或观察性)缺少共变时的行为方式。根据一般可识别性条件,这些估计者在仅包括目标人群中变化的治疗效果改变者时是一致的。我们计算了由缺失的共变差、假定高调变差和有条件的ATE 功能的线性模型所引发的预期偏差。这为每个缺失的共变差模式提供了灵敏性分析。此外,这一方法特别有用,因为它给预期的常变差(RCT 或观察性观察性) 提供了精确性基准的标志。根据一般可识别性条件,这些估计值, 当只包括目标人群变数变化的治疗结果时, 我们最后通过模拟研究, 显示这些变差的精确变差的计算结果不会增加结果, 我们通过这些变数的计算结果, 我们通过这些变数的计算,最后通过这些变差的计算, 将获得了这些变差的计算。