The analysis of causation is a challenging task that can be approached in various ways. With the increasing use of machine learning based models in computational socioeconomics, explaining these models while taking causal connections into account is a necessity. In this work, we advocate the use of an explanatory framework from cooperative game theory augmented with $do$ calculus, namely causal Shapley values. Using causal Shapley values, we analyze socioeconomic disparities that have a causal link to the spread of COVID-19 in the USA. We study several phases of the disease spread to show how the causal connections change over time. We perform a causal analysis using random effects models and discuss the correspondence between the two methods to verify our results. We show the distinct advantages a non-linear machine learning models have over linear models when performing a multivariate analysis, especially since the machine learning models can map out non-linear correlations in the data. In addition, the causal Shapley values allow for including the causal structure in the variable importance computed for the machine learning model.
翻译:对因果关系的分析是一项具有挑战性的任务,可以通过各种方式进行。随着在计算社会经济学中越来越多地使用机器学习模型,解释这些模型,同时考虑因果关系,这是一项必要工作。在这项工作中,我们主张使用合作游戏理论的解释框架,这种框架以合作游戏理论为基础,以美元微积分,即因果变形值,即因果变形值。我们利用因果变形值分析与COVID-19在美国的传播有因果联系的社会经济差异。我们研究了疾病传播的几个阶段,以显示因果联系随时间的变化。我们使用随机效应模型进行因果分析,并讨论了两种方法之间的对应关系以核实结果。我们展示了非线性机器学习模型在进行多变量分析时比线性模型的明显优势,特别是因为机器学习模型可以绘制数据中非线性关联的图。此外,因果变数值允许将因果结构纳入为机器学习模型计算的变量重要性。