走向统一地物归属和反事实解释:实现同一目的的不同方法 (Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End)

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complementarity of these two approaches. Our evaluation on three benchmark datasets - Adult-Income, LendingClub, and German-Credit - confirms the complementarity. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem

翻译：特性属性和反事实解释是解释 ML 模型的流行方法。前者赋予每个输入特性一个重要分数, 而后者则为改变模型预测提供了最小变化的输入实例。为了统一这些方法, 我们根据实际因果关系框架提供了解释, 并介绍了使用这些方法的两大关键结果。首先, 我们提出了一个方法, 从一组反事实例子中产生特性属性解释。这些特征属性表示一个特性对于改变模型分类结果的重要性, 特别是对于这种改变是否必要和/ 足够, 特别是对于基于属性的方法无法提供的特性组群是否必要和/ 或足够。其次, 我们展示如何利用反事实例子来评价基于属性的解释的必要性和充分性框架的好性。因此, 我们强调这两种方法的互补性。我们对三个基准数据集的评估—— 成人- Income, LenderClub, 和德国- Creit- Credit—— 证实了互补性。诸如 LIME和 SHAP 以及W 和 Wachter et al 无法提供的反事实解释方法。其次, 我们通常可以使用反事实解释性解释的特性, 最终无法从IME 和 DICE 排序中找到一个反结果的特征的特征, 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/