We consider a global representation of a regression or classification function by decomposing it into the sum of main and interaction components of arbitrary order. We propose a new identification constraint that allows for the extraction of interventional SHAP values and partial dependence plots, thereby unifying local and global explanations. With our proposed identification, a feature's partial dependence plot corresponds to the main effect term plus the intercept. The interventional SHAP value of feature $k$ is a weighted sum of the main component and all interaction components that include $k$, with the weights given by the reciprocal of the component's dimension. This brings a new perspective to local explanations such as SHAP values which were previously motivated by game theory only. We show that the decomposition can be used to reduce direct and indirect bias by removing all components that include a protected feature. Lastly, we motivate a new measure of feature importance. In principle, our proposed functional decomposition can be applied to any machine learning model, but exact calculation is only feasible for low-dimensional structures or ensembles of those. We provide an algorithm and efficient implementation for gradient-boosted trees (xgboost) and random planted forest. Conducted experiments suggest that our method provides meaningful explanations and reveals interactions of higher orders. The proposed methods are implemented in an R package, available at \url{https://github.com/PlantedML/glex}.
翻译:我们考虑回归或分类功能的全球代表性,将其分解成任意秩序的主要和互动组成部分的总和。 我们提出一个新的识别限制, 允许提取干预性 SHAP 值和部分依赖性块, 从而统一当地和全球的解释。 我们提议的识别, 地块部分依赖性图与主要效果术语相对, 加上拦截。 地块的干涉性 SHAP 值是主要组成部分和所有互动组成部分的加权和总和, 包括美元, 以及该组成部分的对应值 。 这为本地解释带来了新的视角, 如以前仅受游戏理论驱动的 SHAP 值。 我们显示, 拆解性可以用来通过删除包含受保护特性的所有组成部分来减少直接和间接的偏差。 最后, 我们提出一个新的特征重要性衡量标准。 原则上, 我们提议的功能分解可适用于任何机器学习模式, 但精确计算只适用于低度结构或这些结构的宽度值 。 我们为梯度推进型树( xgbowest) 提供了算法和高效的实施, 以及随机的森林解释方法 。 显示, 在有保护性命令/ RML 中, 展示了我们的拟议方法。