Feature attribution methods are popular for explaining neural network predictions, and they are often evaluated on metrics such as comprehensiveness and sufficiency, which are motivated by the principle that more important features -- as judged by the explanation -- should have larger impacts on model prediction. In this paper, we highlight an intriguing property of these metrics: their solvability. Concretely, we can define the problem of optimizing an explanation for a metric and solve it using beam search. This brings up the obvious question: given such solvability, why do we still develop other explainers and then evaluate them on the metric? We present a series of investigations showing that this beam search explainer is generally comparable or favorable to current choices such as LIME and SHAP, suggest rethinking the goals of model interpretability, and identify several directions towards better evaluations of new method proposals.
翻译:用于解释神经网络预测的特性属性方法很受欢迎,而且这些特性往往根据诸如全面性和充足性等衡量标准加以评估,其依据的原则是,根据解释判断,更重要的特性对模型预测应产生更大的影响。在本文中,我们强调这些指标的引人入胜的特性:其可溶性。具体地说,我们可以界定对某一指标作出最优化解释的问题,并利用光束搜索加以解决。这提出了一个显而易见的问题:鉴于这种可溶性,为什么我们仍要发展其他解释者,然后对指标进行评价?我们提出了一系列调查,表明这个光束搜索解释器一般与目前选择如LIME和SHAP相似或有利,建议重新思考模型可解释性的目标,并找出更好地评价新方法建议的若干方向。