Despite the plethora of post hoc model explanation methods, the basic properties and behavior of these methods and the conditions under which each one is effective are not well understood. In this work, we bridge these gaps and address a fundamental question: Which explanation method should one use in a given situation? To this end, we adopt a function approximation perspective and formalize the local function approximation (LFA) framework. We show that popular explanation methods are instances of this framework, performing function approximations of the underlying model in different neighborhoods using different loss functions. We introduce a no free lunch theorem for explanation methods which demonstrates that no single method can perform optimally across all neighbourhoods and calls for choosing among methods. To choose among methods, we set forth a guiding principle based on the function approximation perspective, considering a method to be effective if it recovers the underlying model when the model is a member of the explanation function class. Then, we analyze the conditions under which popular explanation methods are effective and provide recommendations for choosing among explanation methods and creating new ones. Lastly, we empirically validate our theoretical results using various real world datasets, model classes, and prediction tasks. By providing a principled mathematical framework which unifies diverse explanation methods, our work characterizes the behaviour of these methods and their relation to one another, guides the choice of explanation methods, and paves the way for the creation of new ones.
翻译:尽管特设后示范解释方法繁多,但这些方法的基本性质和行为以及每个方法的有效条件没有得到很好理解,但我们在这项工作中弥合了这些差距,并解决了一个根本问题:在特定情况下,应该使用哪种解释方法?为此目的,我们采用功能近似观点,正式确定当地函数近似框架。我们表明,流行解释方法是这一框架的例子,在不同社区使用不同的损失功能对基本模式进行功能近似;我们采用不免费的午餐理论来解释方法,表明任何单一方法都无法在所有社区最优化地发挥作用,并要求在方法中作出选择。为了在方法中作出选择,我们根据功能近似观点制定指导原则,考虑如果在模型是解释功能类别成员时恢复基本模式,一种方法是有效的方法。然后,我们分析流行解释方法在哪些条件下有效,提出在解释方法之间作出选择和创造新方法的建议。最后,我们用各种真实的世界数据集、模型班级和预测任务来对理论结果进行实证。为了在方法中作出选择,我们通过提供一种原则性数学框架,将新的解释方法的另一种方法转化为新的方法。