Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone -- sans a reasoning of how said answer was derived -- is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image -- hardly influencing the network's output -- we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.
翻译:深心神经网络(DNNS)在过去十年中对许多领域产生了极大影响。然而,尽管在很多问题上表现出了超强的性能,它们的黑箱性质在解释性方面仍构成巨大的挑战。事实上,可以解释的人工智能(XAI)在许多领域至关重要,而答案本身就要求使用一个分类器的输出日志和解释图。这些薄弱的假设使我们的方法在现实世界模型和数据方面非常有用。我们用两种基准数据集(CIFAR100和图象网)来比较我们的方法表现。我们首先使用四种预知的深层次学习模型:VGG16-CIFAR100,VGIXAIAI,一个对 XAAI算法的模型和对抗性攻击,一个模型的模型和模型解释性能解释性能的模型,一个不使用OVG16-CIFAR100,一个硬性能的模型,一个不使用OVIFFINet-IAILAIL 的模型,另一个方法就是“ULOIAIAIA” 。