从“从何处”到“什么”:通过概念相关性宣传,走向人类能够理解的解释 (From "Where" to "What": Towards Human-Understandable Explanations through Concept Relevance Propagation)

from arxiv, 79 pages (40 pages manuscript, 10 pages references, 29 pages appendix) 51 figures (26 in manuscript, 25 in appendix) 1 table (in appendix)

The emerging field of eXplainable Artificial Intelligence (XAI) aims to bring transparency to today's powerful but opaque deep learning models. While local XAI methods explain individual predictions in form of attribution maps, thereby identifying where important features occur (but not providing information about what they represent), global explanation techniques visualize what concepts a model has generally learned to encode. Both types of methods thus only provide partial insights and leave the burden of interpreting the model's reasoning to the user. Only few contemporary techniques aim at combining the principles behind both local and global XAI for obtaining more informative explanations. Those methods, however, are often limited to specific model architectures or impose additional requirements on training regimes or data and label availability, which renders the post-hoc application to arbitrarily pre-trained models practically impossible. In this work we introduce the Concept Relevance Propagation (CRP) approach, which combines the local and global perspectives of XAI and thus allows answering both the "where" and "what" questions for individual predictions, without additional constraints imposed. We further introduce the principle of Relevance Maximization for finding representative examples of encoded concepts based on their usefulness to the model. We thereby lift the dependency on the common practice of Activation Maximization and its limitations. We demonstrate the capabilities of our methods in various settings, showcasing that Concept Relevance Propagation and Relevance Maximization lead to more human interpretable explanations and provide deep insights into the model's representations and reasoning through concept atlases, concept composition analyses, and quantitative investigations of concept subspaces and their role in fine-grained decision making.

翻译：新兴的可移植人工智能领域(XAI)旨在为当今强大但不透明的深层学习模式带来透明度。虽然当地XAI方法以归属地图的形式解释个人预测,从而确定重要特征的出现地点(但不提供关于其所代表的信息),但全球解释技术可以直观地描述模型通常学到的编码概念。这两种方法都只能提供局部的洞察力,使用户承担解释模型推理的重负。只有很少的当代技术旨在将地方和全球XAI背后的原则结合起来,以获得更多信息的解释。然而,这些方法往往局限于特定的模型结构,或对培训制度或数据和标签提供附加要求,从而使得任意预先培训的模式实际上不可能在事后应用。我们在此工作中引入了“相关性促进(CRP)”概念,将地方和人类视角结合起来,从而可以回答“模式”和“什么”问题,供个人预测,而不必施加更多的限制。我们进一步引入了“相关性最大化”原则,以寻找具有代表性的模型或指标化概念或数据及标签提供的额外要求,从而使得对任意预先培训的模式的应用几乎不可能。我们提出了“相关性”概念,从而展示其解释,从而展示了我们有关其定义的可靠性定义的可靠性概念,从而展示了我们展示了各种方法。