We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstrate how this framework may be used to provide explanations for decisions made by general, data-driven AI systems that may incorporate features with arbitrary data types and multiple predictive models, and (2) propose a heuristic procedure to find the most useful explanations depending on the context. We then contrast counterfactual explanations with methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME) and present two fundamental reasons why we should carefully consider whether importance-weight explanations are well-suited to explain system decisions. Specifically, we show that (i) features that have a large importance weight for a model prediction may not affect the corresponding decision, and (ii) importance weights are insufficient to communicate whether and how features influence decisions. We demonstrate this with several concise examples and three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than importance weights.
翻译:我们研究反事实解释,解释基于模型的AI系统做出的决定。我们认为,反事实方法将解释定义为系统数据投入的一组解释,这种解释是因果驱动决定的一组数据投入(即改变对一组决定的投入),并且不可降低(即改变任何一组投入并不改变决定)。我们(1) 表明如何利用这一框架解释可能包含任意数据类型和多种预测模型特征的一般、数据驱动的AI系统做出的决定。(2) 提议一种超常程序,以根据背景找到最有用的解释。 然后,我们将反事实解释与根据重要性(例如SHAP、LIME)通过权衡特征来解释模型预测的方法作对比,并提出两个根本原因,说明我们应认真考虑重要加权解释是否完全适合解释系统的决定。具体地说,我们表明(一)对于模型预测具有极大重要性的特征可能不会影响相应的决定,以及(二) 重要程度不足以说明是否和如何根据这些特征更好地比较判断决定的重要性(例如SHAP、LME),我们用一些精确的事例来解释,用这些例子来说明这些结论性解释。