This paper considers treatment evaluation when outcomes are only observed for a subpopulation due to sample selection or outcome attrition/non-response. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. To control in a data-driven way for potentially high dimensional pre-treatment covariates that motivate the selection-on-observables assumptions, we adapt the double machine learning framework to sample selection problems. That is, we make use of (a) Neyman-orthogonal and doubly robust score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners. The estimator is available in the causalweight package for the statistical software R.
翻译:本文考虑在由于抽样选择或结果自然减员/无反应而仅对子人口观察到结果时进行治疗评价。为了确定身份,我们将选择即观察的治疗任务分配假设与结果自然减员/抽样选择过程的可观察选择或工具变量假设结合起来。为了以数据驱动的方式控制潜在的高维预处理共变以激励选择即观察的假设,我们根据抽样选择问题调整了双机学习框架。也就是说,我们利用(a) 内曼体外和双倍强的评分功能,这意味着治疗效果估计的稳健性,以在机器学习估计结果、治疗或抽样选择模型中适度的规范化偏差;(b) 抽样分离(或交叉校正)以防止过度配置偏差。我们证明,拟议的估算者在与机学习者有关的具体常规条件下是随机正常和根性一致的。统计软件R在因果加权包中提供了估计值。