Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they aim to predict. Understanding the causal effect of these predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes. This can make the causal effects of predictions on outcomes impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability, and despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal relationship between covariates, predictions and outcomes can be identified from observational data: randomization in predictions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. Empirically we show that given our identifiability conditions hold, standard variants of supervised learning that predict from predictions by treating the prediction as an input feature can indeed find transferable functional relationships that allow for conclusions about newly deployed predictive models. These positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.
翻译:了解这些预测对最终结果的因果关系,对于预测未来预测模型和选择要部署的模型的影响至关重要。然而,这一因果估计任务提出了独特的挑战:模型预测通常是投入特征的决定性功能,与结果高度相关。这可以使预测结果的因果关系无法与共差的直接影响脱钩。我们通过因果可辨的透镜来研究这一问题,尽管这一问题非常笼统,但我们强调三种自然假设,从观察数据中可以确定共变、预测和结果之间的因果关系:预测的随机化、数据收集期间所部署的预测模型的超度参数和与结果高度相关联。我们有把握地表明,鉴于我们的可辨识性条件,通过将预测视为一个输入特征来监督预测得出的因果关系,我们确实可以找到可转移的功能关系,从而可以得出关于新部署的预测预测结果的重要结论,在社会循环模型的收集过程中,这些积极的结果可以使社会预测结果得到更好的记录,从而能够使社会预测结果得到更深刻的更新。