We consider a global explanation of a regression or classification function by decomposing it into the sum of main components and interaction components of arbitrary order. When adding an identification constraint that is motivated by a causal interpretation, we find q-interaction SHAP to be the unique solution to that constraint. Here, q denotes the highest order of interaction present in the decomposition. Our result provides a new perspective on SHAP values with various practical and theoretical implications: If SHAP values are decomposed into main and all interaction effects, they provide a global explanation with causal interpretation. In principle, the decomposition can be applied to any machine learning model. However, since the number of possible interactions grows exponentially with the number of features, exact calculation is only feasible for methods that fit low dimensional structures or ensembles of those. We provide an algorithm and efficient implementation for gradient boosted trees (xgboost and random planted forests that calculates this decomposition. Conducted experiments suggest that our method provides meaningful explanations and reveals interactions of higher orders. We also investigate further potential of our new insights by utilizing the global explanation for motivating a new measure of feature importance, and for reducing direct and indirect bias by post-hoc component removal.
翻译:我们考虑对回归或分类功能作出全球性解释,将其分解成任意秩序的主要组成部分和相互作用组成部分的总和。当添加由因果关系解释驱动的识别限制时,我们发现q-interaction SHAP是这一限制的独特解决办法。这里,q-interaction SHAP表示分解中存在的最高互动顺序。我们的结果为SHAP值提供了具有各种实际和理论含义的新视角:如果SHAP值分解成主要和所有相互作用效应,它们提供一种具有因果关系解释的全球解释。原则上,分解可以适用于任何机器学习模式。然而,由于可能的互动数量随着地物数量的增加而急剧增加,精确的计算只能用于适合低维度结构或这些特性集合的方法。我们为梯度增生树木(xgboust和随机栽培的森林,计算这种分解作用)提供了一种算法和有效实施的方法。进行实验表明,我们的方法提供了有意义的解释,并揭示了更高层次的相互作用。我们还通过利用全球解释来激发新的地貌重要性,并通过直接和间接的偏差来减少分,进一步调查我们的新见解的潜力。