A multitude of explainability methods and theoretical evaluation scores have been proposed. However, it is not yet known: (1) how useful these methods are in real-world scenarios and (2) how well theoretical measures predict the usefulness of these methods for practical use by a human. To fill this gap, we conducted human psychophysics experiments at scale to evaluate the ability of human participants (n=1,150) to leverage representative attribution methods to learn to predict the decision of different image classifiers. Our results demonstrate that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios. Furthermore, the degree to which individual attribution methods helped human participants predict classifiers' decisions varied widely across categorization tasks and datasets. Overall, our results highlight fundamental challenges for the field -- suggesting a critical need to develop better explainability methods and to deploy human-centered evaluation approaches. We will make the code of our framework available to ease the systematic evaluation of novel explainability methods.
翻译:提出了多种解释方法和理论评价分数,但还不清楚:(1) 这些方法在现实世界情景中如何有用,(2) 理论措施如何很好地预测这些方法对于人类实际使用的有用性。为填补这一空白,我们进行了规模的人类心理物理学实验,以评价人类参与者的能力(n=1,150),利用代表性归属方法来学习预测不同图像分类者的决定。我们的结果表明,用于评分方法的理论措施没有充分反映现实世界情景中个人归属方法的实际效用。此外,个别归属方法在多大程度上帮助人类参与者预测分类者的决定在分类任务和数据集中的差异很大。总体而言,我们的结果突出了实地的基本挑战 -- -- 表明迫切需要制定更好的解释方法和采用以人为本的评价方法。我们将提供我们框架的代码,以方便对新解释方法进行系统的评价。