Attribution methods have been shown as promising approaches for identifying key features that led to learned model predictions. While most existing attribution methods rely on a baseline input for performing feature perturbations, limited research has been conducted to address the baseline selection issues. Poor choices of baselines limit the ability of one-vs-one (1-vs-1) explanations for multi-class classifiers, which means the attribution methods were not able to explain why an input belongs to its original class but not the other specified target class. 1-vs-1 explanation is crucial when certain classes are more similar than others, e.g. two bird types among multiple animals, by focusing on key differentiating features rather than shared features across classes. In this paper, we present GAN-based Model EXplainability (GANMEX), a novel approach applying Generative Adversarial Networks (GAN) by incorporating the to-be-explained classifier as part of the adversarial networks. Our approach effectively selects the counterfactual baseline as the closest realistic sample belong to the target class, which allows attribution methods to provide true 1-vs-1 explanations. We showed that GANMEX baselines improved the saliency maps and led to stronger performance on perturbation-based evaluation metrics over the existing baselines. Existing attribution results are known for being insensitive to model randomization, and we demonstrated that GANMEX baselines led to better outcome under the cascading randomization of the model.
翻译:虽然大多数现有归因方法都依赖基线投入来进行特征扰动,但为处理基线选择问题进行了有限的研究。基准选择不当限制了对多级分类器一五一(1-vs-1)解释的能力,这意味着归因方法无法解释为什么输入属于最初类别,而不是其他特定目标类别。 当某些类别与其它类别更为相似时,1-vs-1解释至关重要,例如,多种动物的两种鸟类类型,侧重于关键区别特征,而不是各类别之间共享特征。在本文件中,我们介绍了基于GAN的模型可扩展性(GANMEX),这是对多级分类器采用Genemental Aversarial 网络(GAN)解释的新颖办法,即将待爆分类器作为对抗性网络的一部分。我们的方法有效地选择了最接近现实的模型基线,因为最接近的样本属于目标类别,从而使得归属方法能够提供真实的1-vs-1解释。我们显示GANMEX的归因基准改进了我们所了解的现有基准,并显示的比现有基准更强的GANMEX基准,从而改进了现有结果基准,我们所显示的比基准。