Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra-high dimensional data. In this paper, we propose the causal ball screening for confounder selection from modern ultra-high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies imply that one should exclude causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model-free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency, normality and efficiency. Synthetic and real data analyses show that our proposal performs favorably with existing methods in a range of realistic settings.
翻译:与预测模型的复杂选择不同,我们的共性选择程序旨在控制混结,同时提高结果因果关系估计的效率。以前的实验和理论研究表明,人们应该排除不相容治疗的原因,包括模型选择的一致性、正常性和有效性。受这些结果的驱动,我们的目标是将结果的所有预测者都保留在性能计分和结果回归模型中。我们的建议的一个显著特点是,我们采用不采用结果模式的偏好评分模型选择程序,从而保持结果因果关系估测器的双倍稳健性。我们的理论分析表明,拟议的程序具有若干特性,包括模型选择的一致性、正常性和效率。同步和真实的数据分析表明,我们的建议与现有方法的分布范围十分一致。