In explainable machine learning, local post-hoc explanation algorithms and inherently interpretable models are often seen as competing approaches. In this work, we offer a partial reconciliation between these two approaches by showing that Shapley Values correspond to Generalized Additive Models (GAMs). We introduce $n$-Shapley Values, a parametric family of local post-hoc explanation algorithms that explain individual predictions with interaction terms up to order $n$. By varying the parameter $n$, these explanations cover the entire range from Shapley Values up to a uniquely determined decomposition of the function that we attempt to explain. The relationship between $n$-Shapley Values and this decomposition offers a functionally-grounded characterization of Shapley Values, and highlights the limitations of these explanations. We then show that $n$-Shapley Values recover GAMs with interaction terms up to order $n$, which implies that the original Shapely Values recover GAMs without interaction terms. Taken together, our results offer a precise characterization of Shapley Values as they are being used in explainable machine learning. Python code to estimate $n$-Shapley Values and replicate the results in this paper is available at \url{https://github.com/tml-tuebingen/nshap}.
翻译:在可解释的机器学习中,本地的热后解释算法和内在可解释模型往往被视为相互竞争的方法。在这项工作中,我们通过显示Shapley值与通用Additive模型(GAMs)相对应来对这两种方法进行部分调和。我们引入了$-Shapley 值,这是一个本地的热后解释算法的参数组,用来解释单个预测,互动术语最高为美元。通过将参数值差异化,这些解释涵盖从Shaply值到我们试图解释的独一确定的函数分解。美元-Shapley值和这种分解组合之间的关系提供了基于功能的Shapley 值特征,突出了这些解释的局限性。我们然后显示,美元-Shaply 值回收了单个预测,互动术语最高为$,这意味着原始的精度值在没有互动术语下恢复 GAMs。加Ms。我们的结果加起来提供了对Shapley $/apply 值精确的描述,因为它们正在被用于可解释的机器的Spreplusimal as palal resual。