Explaining the decisions of models is becoming pervasive in the image processing domain, whether it is by using post-hoc methods or by creating inherently interpretable models. While the widespread use of surrogate explainers is a welcome addition to inspect and understand black-box models, assessing the robustness and reliability of the explanations is key for their success. Additionally, whilst existing work in the explainability field proposes various strategies to address this problem, the challenges of working with data in the wild is often overlooked. For instance, in image classification, distortions to images can not only affect the predictions assigned by the model, but also the explanation. Given a clean and a distorted version of an image, even if the prediction probabilities are similar, the explanation may still be different. In this paper we propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances that tailor the neighbourhoods used to training surrogate explainers. We also show that by operating in this way, we can make the explanations more robust to distortions. We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
翻译:解释模型的决定正在图像处理领域变得十分普遍,无论是通过使用休克后的方法,还是通过创造内在的解释模型。虽然广泛使用代用解释器是一种令人欢迎的附加物,可以用来检查和理解黑盒模型,但评估解释的稳健性和可靠性是其成功的关键。此外,虽然在解释领域现有的工作提出了解决这一问题的各种战略,但在野生数据操作方面的挑战往往被忽视。例如,在图像分类方面,图像扭曲不仅会影响模型指定的预测,而且会影响解释。如果图像的清晰和扭曲版本,即使预测的概率相似,解释可能仍然不同。在本文中,我们提出一种方法,通过嵌入感性距离来评估解释扭曲解释的效果,使社区适应用于培训代用解释器解释者。我们还表明,通过这种操作,我们可以对扭曲做出更准确的解释。我们在图像网络-C数据集中生成图像的解释,并演示如何使用代用代用图像的偏差来解释扭曲和引用。