Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra-high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra-high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model-free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and point-wise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
翻译:与预测模型的复杂选择任务不同,我们混淆的筛选程序旨在控制混淆,同时提高由此得出的因果关系估计的效率。 以往的经验和理论研究表明,排除治疗的原因并非相近者。 受这些结果的驱动,我们的目标是将所有预测结果的指数都保留在主度分数和结果回归模型中。 我们提案的一个显著特点是,我们采用无结果模型选择偏向模型模型,从而保持由此而来的因果关系估计的双重稳健性。 我们的理论分析表明,拟议的程序具有一些特性,包括模型选择的一致性和点对点的正常性。 合成和真实的数据分析显示,我们从当前数据统计学中获取的、真实性分析显示,我们的建议是使用现有数据格式的。