通过删除解释:示范解释统一框架 (Explaining by Removing: A Unified Framework for Model Explanation)

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

翻译：研究人员提出了多种多样的示范解释方法,但目前还不清楚大多数方法是如何关联的,还是一种方法比另一种方法更可取。我们描述了一种新的统一方法类别、基于清除的解释,这些方法基于模拟特性去除的原则,以量化每个特性的影响。这些方法在几个方面各不相同,因此我们制定了一个框架,将每种方法分为三个方面:(1) 方法如何消除特征,(2) 方法解释什么模式行为,(3) 方法如何概括每个特性的影响。我们的框架将26种现有方法统一起来,包括一些最广泛使用的方法:SHAP、LME、有意义的扰动和变异测试。这种新理解的解释方法有丰富的联系,我们利用这些被解释性文献基本上忽视的工具来加以研究。为了将基于清除的解释建立在认知性心理学中,我们表明,去除特征是一种简单的应用,是减少反事实推理的简单应用。合作游戏理论揭示了不同方法之间的关系和权衡,我们从中得出一些条件,所有基于清除的解释都有更好的信息-理论解释。通过这一分析,我们制定一个统一的框架可以帮助构建一个更牢固的理论基础。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日