Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts, since perturbed images may be out-of-distribution (OOD). In this paper, we have conducted computational experiments to estimate the contribution of perturbation artifacts and developed a method to estimate the fidelity of interpretability methods. We demonstrate that, while perturbation artifacts indeed exist, we can minimize and characterize their impact on fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.
翻译:尽管深度神经网络在图像分类、检测和预测方面表现优异,但描述DNN如何做出某项决定的典型性能,仍然是一个尚未解决的问题,因此产生了一些可解释性方法; 热后可解释性方法主要旨在量化与分类概率有关的输入特征的重要性; 然而,由于缺乏地面真实性和具有不同操作特点的可解释性方法的存在,评估这些方法是一项关键挑战; 评估可解释性方法的流行方法是破坏被认为对某一预测十分重要的输入特征,并观察准确性下降; 然而, 扰动本身可能引入人工制品,因为被扰动的图像可能超出分配范围(OOOD); 在本文中,我们进行了计算实验,以评估扰动性文物的贡献,并制定了一种方法来估计可解释方法的准确性。 我们证明,虽然存在扰动性文物,但我们可以通过使用根据Most Instriming First(MIF)和Minest Refority Refority Reformal Proformation Asessionality the we Firstal-IF-IF) Proformality Arrive First As-IF-IF-IF) 和MIF-IFIF-IF-S-S-IFAppregaltium As-S-IF) 4 Arrupal-S-S-S-IFAppreviewsmationsmationsionality 的模型, 我们可以性估算, 我们可以性能性,我们可以性能性,我们可以尽量减少。</s>