为一般化而重新加权 RCT : 有限的抽样误差和变量选择 (Reweighting the RCT for generalization: finite sample error and variable selection)

Randomized Controlled Trials (RCTs) may suffer from limited scope. In particular, samples may be unrepresentative: some RCTs over- or under- sample individuals with certain characteristics compared to the target population, for which one wants conclusions on treatment effectiveness. Re-weighting trial individuals to match the target population can improve the treatment effect estimation. In this work, we establish the exact expressions of the bias and variance of such reweighting procedures -- also called Inverse Propensity of Sampling Weighting (IPSW) -- in presence of categorical covariates for any sample size. Such results allow us to compare the theoretical performance of different versions of IPSW estimates. Besides, our results show how the performance (bias, variance, and quadratic risk) of IPSW estimates depends on the two sample sizes (RCT and target population). A by-product of our work is the proof of consistency of IPSW estimates. Results also reveal that IPSW performances are improved when the trial probability to be treated is estimated (rather than using its oracle counterpart). In addition, we study choice of variables: how including covariates that are not necessary for identifiability of the causal effect may impact the asymptotic variance. Including covariates that are shifted between the two samples but not treatment effect modifiers increases the variance while non-shifted but treatment effect modifiers do not. We illustrate all the takeaways in a didactic example, and on a semi-synthetic simulation inspired from critical care medicine.

翻译：控制控制试验(RCTs)的范围可能有限。特别是,样本可能不具有代表性:一些RCT 与目标人群相比,具有某些特征的样本过量或过低的样本个人,与目标人群相比具有某些特征,因此需要就治疗效果得出结论。重新加权试验人员与目标人群相比,可以改进治疗效果估计。在这项工作中,我们确定这种重新加权程序的偏差和差异的确切表达方式 -- -- 也称为抽样规模具有绝对共变性的取样半量值(IPSW) -- -- 存在任何抽样规模的绝对共变数。这些结果使我们能够比较不同版本IPSW估计的理论性能。此外,我们的成果表明IPSW估计的性能(趋势、差异和二次风险)如何取决于两种样本规模(RCT和目标人群)。我们工作的副产品是IPSW估计一致性的证据。结果还表明,当估计试验概率时,IPSW的性能提高(而不是使用其直径对等数据 ) 。此外,我们研究变量的选择: 包括不易变性作用的样本影响,但不必要地分析结果,但会改变性影响。