Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.
翻译:深入理解归因方法的差异与系统化评估
深度神经网络在许多视觉任务上取得了很大成功,但其黑盒本质使其难以解释。为了克服这一问题,已经提出了各种事后归因方法来确定对模型决策最具影响力的图像区域。由于不存在可靠的基准归因,因此评估这些方法是具有挑战性的。因此,我们提出了三个新颖的评估方案来更可靠地衡量这些方法的忠诚度,使它们之间的比较更公平,并使可视化更加系统化。为了解决忠诚度问题,我们提出了一种新颖的评估设置(DiFull),在其中仔细控制输入的哪些部分可以影响输出,以区分可能的和不可能的归因。为了解决公平性问题,我们注意到不同的方法应用于不同的层次,这会使任何比较产生偏差,因此在相同的层次上评估所有方法(ML-Att),并讨论其对定量指标的影响。为了更系统化的可视化,我们提出了一种方案(AggAtt),以定性评估完整数据集上的方法。我们使用这些评估方案来研究一些广泛使用的归因方法在各种模型上的优点和局限性。最后,我们提出了一个后处理平滑步骤,显着提高了一些归因方法的性能,并讨论其适用性。