A popular explainable AI (XAI) approach to quantify feature importance of a given model is via Shapley values. These Shapley values arose in cooperative games, and hence a critical ingredient to compute these in an XAI context is a so-called value function, that computes the "value" of a subset of features, and which connects machine learning models to cooperative games. There are many possible choices for such value functions, which broadly fall into two categories: on-manifold and off-manifold value functions, which take an observational and an interventional viewpoint respectively. Both these classes however have their respective flaws, where on-manifold value functions violate key axiomatic properties and are computationally expensive, while off-manifold value functions pay less heed to the data manifold and evaluate the model on regions for which it wasn't trained. Thus, there is no consensus on which class of value functions to use. In this paper, we show that in addition to these existing issues, both classes of value functions are prone to adversarial manipulations on low density regions. We formalize the desiderata of value functions that respect both the model and the data manifold in a set of axioms and are robust to perturbation on off-manifold regions, and show that there exists a unique value function that satisfies these axioms, which we term the Joint Baseline value function, and the resulting Shapley value the Joint Baseline Shapley (JBshap), and validate the effectiveness of JBshap in experiments.
翻译:用于量化特定模型的特性重要性的流行解释 AI (XAI) 方法, 通过 Shapley 值来量化特定模型的重要性。 这些 Shapley 值出现在合作游戏中, 因此, 在 XAI 背景下计算这些值的一个关键要素是所谓的值函数, 计算一组特性的“ 值”, 并将机器学习模型与合作游戏联系起来。 对于这类值函数, 有许多可能的选择, 广泛分为两类: 固定值和非固定值函数, 分别采取观察性和干预性观点。 然而, 这两种类别都有各自的缺陷, 即 配置值函数违反关键xixmatical 特性, 且计算成本昂贵, 而非配置值函数较少注意数据元数, 并评估其未培训的区域的模型。 因此, 还没有达成共识, 要使用哪类价值函数。 在本文中, 除了这些现有问题之外, 两种数值函数都容易在低密度区域进行对抗性操纵。 我们正式化了J 值的边际值函数的边际值, 既尊重模型, 也显示共同值 的底值 值 值, 值 至 的 显示 的 的 沙比 值 值 值 的 的 的 的 值 的 的 的 的 值 至 的 的 的 的 的 的 的 的 的 至 的 的 的 至 的 的 。