Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they aim to predict. Understanding the causal effect of these predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes, which can make the causal effects of predictions impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability, and despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal effect of predictions on outcomes can be identified from observational data: randomization in predictions or prediction-based decisions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. We show empirically that, under suitable identifiability conditions, standard variants of supervised learning that predict from predictions can find transferable functional relationships between features, predictions, and outcomes, allowing for conclusions about newly deployed prediction models. Our positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.
翻译:了解这些预测对最终结果的因果关系,对于预测未来预测模型和选择要部署的模型的影响至关重要。然而,这一因果估计任务提出了独特的挑战:模型预测通常是投入特征的决定性功能,与结果高度相关,这会使预测的因果关系无法与共差的直接效应脱钩。我们通过因果关系的透镜研究这一问题,尽管这一问题非常笼统,但我们强调三种自然假设,从观察数据中可以确定预测结果的因果关系:预测或预测决定的随机化、数据收集期间部署的预测模型的过度量化以及离散预测产出。我们从经验上表明,在适当的识别条件下,从预测中预测得到的监督性学习的标准变量可以找到特征、预测和结果之间的可转移功能关系,从而能够得出关于新部署的预测模型的结论。我们正面的预测结果的收集工作取决于如何更好地进行社会周期性分析。我们正面的标准数据采集工作将使得在收集数据的过程中能够进行更好的社会循环分析。