如何选择因果推断的预测模型? (How to select predictive models for causal inference?)

Predictive models -- as with machine learning -- can underpin causal inference, to estimate the effects of an intervention at the population or individual level. This opens the door to a plethora of models, useful to match the increasing complexity of health data, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Classic machine-learning cross-validation procedures are not directly applicable. Indeed, an appropriate selection procedure for causal inference should equally weight both outcome errors for each individual, treated or not treated, whereas one outcome may be seldom observed for a sub-population. We study how more elaborate risks benefit causal model selection. We show theoretically that simple risks are brittle to weak overlap between treated and non-treated individuals as well as to heterogeneous errors between populations. Rather a more elaborate metric, the R-risk appears as a proxy of the oracle error on causal estimates, observable at the cost of an overlap re-weighting. As the R-risk is defined not only from model predictions but also by using the conditional mean outcome and the treatment probability, using it for model selection requires adapting cross validation. Extensive experiments show that the resulting procedure gives the best causal model selection.

翻译：与机器学习一样,预测模型可以作为因果关系推断的依据,用以估计在人口或个人层面的干预效果。这打开了众多模型的大门,有助于匹配健康数据日益复杂的情况,但也有助于匹配模型选择的潘多拉框:这些模型中哪些是产生最有效的因果估计?典型的机械学习交叉校准程序并不直接适用。事实上,因果推断的适当选择程序应当对每个个人、被处理或未处理的结果错误进行同等加权,而对于一个亚人口而言,则可能很少观察到一个结果。我们研究了如何更详细的风险对因果模型选择产生何种好处。我们从理论上表明,简单的风险很容易导致被治疗者与未接受治疗者之间薄弱的重叠,以及人群之间的不同错误。比较详细的衡量标准是,R风险似乎是因果估计误的代谢,以重加权重叠的代价为观察。由于R风险的定义不仅来自模型预测,而且还通过使用有条件的平均值和治疗概率来确定,因此模型需要修改。广泛的实验表明,由此得出的最佳因果选择程序是最佳的。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日