In this paper, we propose a propensity score adapted variable selection procedure to select covariates for inclusion in propensity score models, in order to eliminate confounding bias and improve statistical efficiency in observational studies. Our variable selection approach is specially designed for causal inference, it only requires the propensity scores to be $\sqrt{n}$-consistently estimated through a parametric model and need not correct specification of potential outcome models. By using estimated propensity scores as inverse probability treatment weights in performing an adaptive lasso on the outcome, it successfully excludes instrumental variables, and includes confounders and outcome predictors. We show its oracle properties under the "linear association" conditions. We also perform some numerical simulations to illustrate our propensity score adapted covariate selection procedure and evaluate its performance under model misspecification. Comparison to other covariate selection methods is made using artificial data as well, through which we find that it is more powerful in excluding instrumental variables and spurious covariates.
翻译:在本文中,我们提出一个适应性分数的可变选择程序,以选择用于适应性分数模型的共同变量,从而消除混淆的偏差,提高观测研究的统计效率。我们的可变选择方法是专门为因果推断而设计的,它只要求偏差分通过参数模型以美元/sqrt{n}$一致估算,不需要正确说明潜在结果模型。通过使用估计性分数作为反概率处理权重,对结果进行适应性拉索,它成功地排除了工具变量,并包括了同源变量和结果预测器。我们在“线性关联”条件下展示了它的孔性特性。我们还进行了一些数字模拟,以说明我们适应性分数的共变数选择程序,并根据模型错误区分评估其性能。与其他共变数选择方法的比较也使用人工数据进行,通过这些数据我们发现它在排除工具变量和虚假的共变数方面更为有力。