Explainable artificial intelligence is proposed to provide explanations for reasoning performed by an Artificial Intelligence. There is no consensus on how to evaluate the quality of these explanations, since even the definition of explanation itself is not clear in the literature. In particular, for the widely known Local Linear Explanations, there are qualitative proposals for the evaluation of explanations, although they suffer from theoretical inconsistencies. The case of image is even more problematic, where a visual explanation seems to explain a decision while detecting edges is what it really does. There are a large number of metrics in the literature specialized in quantitatively measuring different qualitative aspects so we should be able to develop metrics capable of measuring in a robust and correct way the desirable aspects of the explanations. In this paper, we propose a procedure called REVEL to evaluate different aspects concerning the quality of explanations with a theoretically coherent development. This procedure has several advances in the state of the art: it standardizes the concepts of explanation and develops a series of metrics not only to be able to compare between them but also to obtain absolute information regarding the explanation itself. The experiments have been carried out on image four datasets as benchmark where we show REVEL's descriptive and analytical power.
翻译:提出可解释的人工智能是为了解释人工智能的推理。对于如何评价这些解释的质量,没有共识,因为文献中甚至解释的定义本身也不明确。特别是,对于广为人知的当地线性解释,有评估解释质量的建议,尽管它们有理论上的不一致之处。图像的情况甚至更成问题,视觉解释似乎解释一个决定,而探测边缘则是它的真正作用。文献中有大量专门从数量上衡量不同质量方面的计量标准,因此我们应该能够制定能够以稳健和正确的方式衡量解释的可取方面的计量标准。在本文件中,我们提议了一个程序,称为REVEL,用理论上的一致性发展来评价解释质量的不同方面。这一程序在艺术状态上有若干进步:它使解释概念标准化,并发展一系列衡量标准,不仅能够相互比较,而且还能获得关于解释本身的绝对信息。实验是在图像四数据集上进行的,作为基准,我们展示了REVEL的描述和分析能力。