Interpretability of Deep Learning (DL) models is arguably the barrier in front of trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness--indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI technique. To this end, we identify the following challenges that state-of-the-art is unable to cope with collectively: i) XAI techniques are highly heterogeneous; ii) misinterpretations are normally rare events; iii) both worst-case and overall robustness are of practical interest. In this paper, we propose two evaluation methods to tackle them--i) they are of black-box nature, based on Genetic Algorithm (GA) and Subset Simulation (SS); ii) bespoke fitness functions are used by GA to solve a constrained optimisation efficiently, while SS is dedicated to estimating rare event probabilities; iii) two diverse metrics are introduced, concerning the worst-case interpretation discrepancy and a probabilistic notion of \textit{how} robust in general, respectively. We conduct experiments to study the accuracy, sensitivity and efficiency of our methods that outperform state-of-the-arts. Finally, we show two applications of our methods for ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.
翻译:深层次学习(DL)模型的可解释性可以说是值得信赖的AI面前的障碍。 尽管解释性AI(XAI)社区做出了巨大努力,但解释缺乏稳健性和整体性强健性,因此可能导致不同的XAI结果。因此,鉴于XAI技术,评估DL可解释性有多强至关重要。为此,我们确定了以下挑战,即最新技术无法集体应对的以下挑战:一) XAI技术高度差异性;二)误解通常是罕见事件;三)最坏和总体强健性都具有实际意义。在本文件中,我们提出了两种评估方法来应对它们――一)它们具有黑盒性质,以遗传Algorithm(GA)和Subset Simmation(SS)为基础;二)我们指出,政府使用健身功能来有效解决限制的优化性,而SS则致力于估算罕见事件概率性;三)提出了两种不同的指标,即最坏的判读性差异和预性性强性强性强性;三)我们提出了两种标准,即我们最差的解释性解释性解释性解释性判断性强性概念;三),我们最后研究的精确性方法显示我们的总体方法。