Local explainability methods -- those which seek to generate an explanation for each prediction -- are becoming increasingly prevalent due to the need for practitioners to rationalize their model outputs. However, comparing local explainability methods is difficult since they each generate outputs in various scales and dimensions. Furthermore, due to the stochastic nature of some explainability methods, it is possible for different runs of a method to produce contradictory explanations for a given observation. In this paper, we propose a topology-based framework to extract a simplified representation from a set of local explanations. We do so by first modeling the relationship between the explanation space and the model predictions as a scalar function. Then, we compute the topological skeleton of this function. This topological skeleton acts as a signature for such functions, which we use to compare different explanation methods. We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations. Then, we show how our framework can be used to identify appropriate parameters for local explainability methods. Our framework is simple, does not require complex optimizations, and can be broadly applied to most local explanation methods. We believe the practicality and versatility of our approach will help promote topology-based approaches as a tool for understanding and comparing explanation methods.
翻译:当地解释方法 -- -- 寻求对每项预测作出解释的方法 -- -- 日益普遍,因为从业人员需要使其模型产出合理化。然而,比较当地解释方法是困难的,因为每个方法都产生不同规模和层面的产出。此外,由于某些解释方法的随机性,不同运行的方法有可能产生对某一观察的相互矛盾的解释。在本文件中,我们提议一个基于地形的框架,从一套当地解释方法中得出一个简化的表述。我们这样做的办法是首先将解释空间与模型预测之间的关系建模成一个缩放函数。然后,我们将这一函数的表层骨架进行比较。这种表层骨架作为这些功能的标志,我们用来比较不同的解释方法。我们证明,我们的框架不仅能够可靠地查明解释技术之间的差异,而且能够提供稳定的表述。然后,我们提出如何利用我们的框架来确定适合当地解释方法的参数。我们的框架很简单简单,不需要复杂的优化,并且可以广泛适用于大多数地方解释方法。我们认为,比较我们的方法的实用性和多面性,作为工具的解释方法。