A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when. In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradients x Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods, demonstrating that no method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model. We empirically validate these theoretical results using various real-world datasets, model classes, and prediction tasks. By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.
翻译:事后临时解释领域的一个关键问题是方法之间缺乏一个共同的基础目标。例如,有些方法的动机是功能近似,有些是游戏理论概念,有些则是获得清洁直观化。目标的分散不仅造成对解释概念的理解不一致,而且造成实际挑战,即不知道何时使用哪种方法。在这项工作中,我们开始通过统一八种受欢迎的临时解释方法(LIME、C-LIME、KernelSHAP、Oclus、Vanilla Gradients、Grients x imple、SlumGrad和Imple Gradients)来应对这些挑战。我们表明,这些方法都是对黑盒模型模型模型模型的局部功能近近似,只在附近和损失功能上有所不同。这种统一使我们能够(1) 说明没有自由的午餐理论用于解释方法,表明任何方法不能在所有社区最优化地发挥作用,(2) 提供基于对黑盒模型的忠实方法, Vanillaents, Van-grads, Vlad-grads, and Integradigradients) 我们用各种真实数据、模型数据、模型、模型、概念推介一种共同方法来解释这些共同的方法,将这些方法和预测功能的另一种解释。通过不同的方法,将这些共同方法,将这些地方方法作为另一种解释方法,将这些共同方法,将这些地方方法用于一种解释。