Feature attribution methods are popular for explaining neural network predictions, and they are often evaluated on metrics such as comprehensiveness and sufficiency. In this paper, we highlight an intriguing property of these metrics: their solvability. Concretely, we can define the problem of optimizing an explanation for a metric, which can be solved by beam search. This observation leads to the obvious yet unaddressed question: why do we use explainers (e.g., LIME) not based on solving the target metric, if the metric value represents explanation quality? We present a series of investigations showing strong performance of this beam search explainer and discuss its broader implication: a definition-evaluation duality of interpretability concepts. We implement the explainer and release the Python solvex package for models of text, image and tabular domains.
翻译:用于解释神经网络预测的特性特性属性方法很受欢迎,而且这些特性往往根据诸如全面性和充分性等指标加以评估。在本文中,我们强调这些指标的一个令人感兴趣的属性:它们的可溶性。具体地说,我们可以界定对指标的解释最优化的问题,可以通过光束搜索加以解决。这一观察引出了一个显而易见但尚未解决的问题:如果指标值代表解释质量,我们为什么使用解释器(例如LIME)而不是基于目标指标指标,如果指标值代表解释质量?我们提出一系列调查,显示这个光束搜索解释器的有力性能,并讨论其更广泛的含义:定义-评价可解释性概念的双重性。我们实施解释器,并发布用于文本、图像和表域模型的Python溶解套件。