This paper studies the problem of estimating the contributions of features to the prediction of a specific instance by a machine learning model and the overall contribution of a feature to the model. The causal effect of a feature (variable) on the predicted outcome reflects the contribution of the feature to a prediction very well. A challenge is that most existing causal effects cannot be estimated from data without a known causal graph. In this paper, we define an explanatory causal effect based on a hypothetical ideal experiment. The definition brings several benefits to model agnostic explanations. First, explanations are transparent and have causal meanings. Second, the explanatory causal effect estimation can be data driven. Third, the causal effects provide both a local explanation for a specific prediction and a global explanation showing the overall importance of a feature in a predictive model. We further propose a method using individual and combined variables based on explanatory causal effects for explanations. We show the definition and the method work with experiments on some real-world data sets.
翻译:本文研究如何估计通过机器学习模型预测某一具体实例的特征的贡献以及某一特征对模型的总体贡献。一个特征(可变)对预测结果的因果关系反映了该特征对预测结果的贡献。一个挑战是,大多数现有的因果影响无法在没有已知因果图表的情况下从数据中估计出来。我们在本文件中根据假设的理想实验,界定了解释性因果关系。定义给模型的不可知性解释带来了若干好处。首先,解释是透明的,并具有因果关系。第二,解释性因果影响估计可以由数据驱动。第三,因果影响既提供了具体预测的局部解释,又提供了全球解释,表明一个特征在预测模型中的总体重要性。我们还根据解释性因果关系的解释性效果,提出了使用个别和综合变量的方法。我们用一些真实世界数据集的实验来展示定义和方法。