We propose a counterfactual approach to train ``causality-aware" predictive models that are able to leverage causal information in static anticausal machine learning tasks (i.e., prediction tasks where the outcome influences the features). In applications plagued by confounding, the approach can be used to generate predictions that are free from the influence of observed confounders. In applications involving observed mediators, the approach can be used to generate predictions that only capture the direct or the indirect causal influences. Mechanistically, we train supervised learners on (counterfactually) simulated features which retain only the associations generated by the causal relations of interest. We focus on linear models, where analytical results connecting covariances, causal effects, and prediction mean squared errors are readily available. Quite importantly, we show that our approach does not require knowledge of the full causal graph. It suffices to know which variables represent potential confounders and/or mediators. We discuss the stability of the method with respect to dataset shifts generated by selection biases and validate the approach using synthetic data experiments.
翻译:我们提出一种反事实的方法来培训“因果觉悟”预测模型,这种模型能够将因果关系信息用于静态非对学机器学习任务(即结果影响特性的预测任务)中。 在受到混乱困扰的应用中,该方法可用于产生不受观察到的混淆者影响的预测。在涉及观察调解人的应用程序中,该方法可用于生成只能捕捉直接或间接因果关系影响的预测。在机械上,我们对受监督的学习者进行了模拟特征的培训,这种模拟特征只保留了因因果关系而产生的关联。我们侧重于线性模型,在这些模型中,可以随时获得分析结果,将共变、因果效应和预测平均正方形错误联系起来。非常重要的是,我们表明,我们的方法并不需要了解全部因果关系图的知识。我们只需知道哪些变量代表潜在的因果影响和/或间接因果关系。我们讨论的是选择偏差所生成的数据集变化方法的稳定性,并利用合成数据实验来验证方法。