While the evaluation of explanations is an important step towards trustworthy models, it needs to be done carefully, and the employed metrics need to be well-understood. Specifically model randomization testing is often overestimated and regarded as a sole criterion for selecting or discarding certain explanation methods. To address shortcomings of this test, we start by observing an experimental gap in the ranking of explanation methods between randomization-based sanity checks [1] and model output faithfulness measures (e.g. [25]). We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations. Firstly, we show that uninformative attribution maps created with zero pixel-wise covariance easily achieve high scores in this type of checks. Secondly, we show that top-down model randomization preserves scales of forward pass activations with high probability. That is, channels with large activations have a high probility to contribute strongly to the output, even after randomization of the network on top of them. Hence, explanations after randomization can only be expected to differ to a certain extent. This explains the observed experimental gap. In summary, these results demonstrate the inadequacy of model-randomization-based sanity checks as a criterion to rank attribution methods.
翻译:虽然对解释的评估是走向可信赖模型的重要一步,但需要谨慎地进行,而且所采用的衡量尺度需要非常清楚。具体而言,模型随机化测试往往被高估,被视为选择或抛弃某些解释方法的唯一标准。为了解决这一测试的缺陷,我们首先观察在随机性理智检查[1]和模型输出忠诚度衡量(例如[25])之间解释方法排名方面的实验差距。我们为评估解释的目的,确定基于随机化模型的理智检查的局限性。首先,我们显示,以零像素顺差生成的非信息化归属图很容易在这类检查中得分高。第二,我们显示,自上而下模式随机化的测试非常有可能保存前方过关启动的尺度。也就是说,即使大引爆的渠道在网络上随机化之后,也非常易为产出做出巨大贡献。因此,随机化后的解释只能有一定程度的差异。这解释了观察到的实验差距。这解释了观察到的实验差距。我们发现,这些结果显示,自上而下而下而来的随机化模式随机化的结果表明,作为模型检查标准的一种不适当性标准。