Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complementarity of these two approaches. Our evaluation on three benchmark datasets - Adult-Income, LendingClub, and German-Credit - confirms the complementarity. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem
翻译:特性属性和反事实解释是解释 ML 模型的流行方法。 前者赋予每个输入特性一个重要分数, 而后者则为改变模型预测提供了最小变化的输入实例。 为了统一这些方法, 我们根据实际因果关系框架提供了解释, 并介绍了使用这些方法的两大关键结果。 首先, 我们提出了一个方法, 从一组反事实例子中产生特性属性解释。 这些特征属性表示一个特性对于改变模型分类结果的重要性, 特别是对于这种改变是否必要和/ 足够, 特别是对于基于属性的方法无法提供的特性组群是否必要和/ 或足够。 其次, 我们展示如何利用反事实例子来评价基于属性的解释的必要性和充分性框架的好性。 因此, 我们强调这两种方法的互补性。 我们对三个基准数据集的评估—— 成人- Income, LenderClub, 和德国- Creit- Credit—— 证实了互补性。 诸如 LIME和 SHAP 以及W 和 Wachter et al 无法提供的反事实解释方法。 其次, 我们通常可以使用反事实解释性解释的特性, 最终无法从IME 和 DICE 排序中找到一个反结果的特征的特征, 。