A plethora of methods have been proposed to explain how deep neural networks reach their decisions but comparatively, little effort has been made to ensure that the explanations produced by these methods are objectively relevant. While several desirable properties for trustworthy explanations have been formulated, objective measures have been harder to derive. Here, we propose two new measures to evaluate explanations borrowed from the field of algorithmic stability: mean generalizability MeGe and relative consistency ReCo. We conduct extensive experiments on different network architectures, common explainability methods, and several image datasets to demonstrate the benefits of the proposed measures.In comparison to ours, popular fidelity measures are not sufficient to guarantee trustworthy explanations.Finally, we found that 1-Lipschitz networks produce explanations with higher MeGe and ReCo than common neural networks while reaching similar accuracy. This suggests that 1-Lipschitz networks are a relevant direction towards predictors that are more explainable and trustworthy.
翻译:提出了许多方法来解释深神经网络是如何达到决定的,但相对而言,没有作出多少努力来确保这些方法提出的解释客观上具有相关性。虽然已经为值得信赖的解释制定了一些可取的属性,但很难得出客观的措施。在这里,我们提出了两项新措施来评价从算法稳定性领域借用的解释:平均通用MeGe和相对一致性ReCo。我们对不同的网络结构、共同可解释的方法和若干图像数据集进行了广泛的实验,以证明拟议措施的益处。 与我们相比,大众忠诚措施不足以保证可靠的解释。最后,我们发现1-Lipschitz网络在达到类似准确性的同时,用更高的MeGe和ReCo而不是普通的神经网络来作出解释。这表明1-Lipschitz网络是预测者的一个相关方向,而预测者更能解释和可信。