Predictive models -- as with machine learning -- can underpin causal inference, to estimate the effects of an intervention at the population or individual level. This opens the door to a plethora of models, useful to match the increasing complexity of health data, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Classic machine-learning cross-validation procedures are not directly applicable. Indeed, an appropriate selection procedure for causal inference should equally weight both outcome errors for each individual, treated or not treated, whereas one outcome may be seldom observed for a sub-population. We study how more elaborate risks benefit causal model selection. We show theoretically that simple risks are brittle to weak overlap between treated and non-treated individuals as well as to heterogeneous errors between populations. Rather a more elaborate metric, the R-risk appears as a proxy of the oracle error on causal estimates, observable at the cost of an overlap re-weighting. As the R-risk is defined not only from model predictions but also by using the conditional mean outcome and the treatment probability, using it for model selection requires adapting cross validation. Extensive experiments show that the resulting procedure gives the best causal model selection.
翻译:与机器学习一样,预测模型可以作为因果关系推断的依据,用以估计在人口或个人层面的干预效果。这打开了众多模型的大门,有助于匹配健康数据日益复杂的情况,但也有助于匹配模型选择的潘多拉框:这些模型中哪些是产生最有效的因果估计?典型的机械学习交叉校准程序并不直接适用。事实上,因果推断的适当选择程序应当对每个个人、被处理或未处理的结果错误进行同等加权,而对于一个亚人口而言,则可能很少观察到一个结果。我们研究了如何更详细的风险对因果模型选择产生何种好处。我们从理论上表明,简单的风险很容易导致被治疗者与未接受治疗者之间薄弱的重叠,以及人群之间的不同错误。比较详细的衡量标准是,R风险似乎是因果估计误的代谢,以重加权重叠的代价为观察。由于R风险的定义不仅来自模型预测,而且还通过使用有条件的平均值和治疗概率来确定,因此模型需要修改。广泛的实验表明,由此得出的最佳因果选择程序是最佳的。