The most popular methods and algorithms for AI are, for the vast majority, black boxes. Black boxes can be an acceptable solution to unimportant problems (in the sense of the degree of impact) but have a fatal flaw for the rest. Therefore the explanation tools for them have been quickly developed. The evaluation of their quality remains an open research question. In this technical report, we remind recently proposed post-hoc explainers FEM and MLFEM which have been designed for explanations of CNNs in image and video classification tasks. We also propose their evaluation with reference-based and no-reference metrics. The reference-based metrics are Pearson Correlation coefficient and Similarity computed between the explanation maps and the ground truth, which is represented by Gaze Fixation Density Maps obtained due to a psycho-visual experiment. As a no-reference metric we use "stability" metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kind of degradations on input images, this metric is in agreement with reference-based ones. Therefore it can be used for evaluation of the quality of explainers when the ground truth is not available.
翻译:对绝大多数人来说,大赦国际最流行的方法和算法是黑盒。黑盒可以是解决无关紧要问题(影响程度)的可接受的办法,但对其他问题也有致命的缺陷。因此,解释工具已经迅速得到发展。对其质量的评估仍然是一个开放的研究问题。在本技术报告中,我们提醒最近提出的后热解解释器FEM和MLFEM,它们的设计是为了解释CNN的图像和视频分类任务。我们还提议用基于参考的和不参考的衡量标准来评价它们。基于参考的衡量标准是解释图和地面真相之间的相似度和相似度,这些是借助于心理-视觉实验获得的Gaze 固化密度图。作为不参考指标,我们使用了Alvarez-Melis和Jaakkola提出的“可耐性”衡量标准。我们研究了它的行为,与基于参考的衡量标准达成了共识,并表明在输入图像出现几种退化的情况下,该衡量标准与基于参考的图表是一致的。因此,当用于评估地面质量时,它可以用来解释。