Concept-based explanations permit to understand the predictions of a deep neural network (DNN) through the lens of concepts specified by users. Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the DNN's latent space. When this holds true, the concept can be represented by a concept activation vector (CAV) pointing in that direction. In this work, we propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN's latent space. Each concept is then represented by a region of the DNN's latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance. We prove that CAR explanations built with radial kernels are invariant under latent space isometries. In this way, CAR assigns the same explanations to latent spaces that have the same geometry. We further demonstrate empirically that CARs offer (1) more accurate descriptions of how concepts are scattered in the DNN's latent space; (2) global explanations that are closer to human concept annotations and (3) concept-based feature importance that meaningfully relate concepts with each other. Finally, we use CARs to show that DNNs can autonomously rediscover known scientific concepts, such as the prostate cancer grading system.
翻译:以概念为基础的解释使得人们能够通过用户指定的概念透镜理解深神经网络(DNN)的预测。现有方法假定,说明概念的示例是在DNN潜居空间的固定方向下绘制的。如果情况属实,这一概念可以被指向该方向的概念激活矢量(CAV)所代表。在这项工作中,我们提议放松这一假设,允许概念实例分散于DNN潜居空间的不同组群中。然后,每个概念都由DNNN潜藏空间的区域代表,其中包括这些组群,我们称之为概念激活区域(CAR)。为了正式确定这个概念,我们引入了CAV正式形式主义的延伸,以DNNN为主,这种扩展以内圈为主,支持矢量分类师。这种形式化可以产生基于全球概念的解释和基于当地概念的特性重要性。我们证明,在DNNF空间下建立的CAR解释是不易变的。这样,CAR对具有相同的潜在空间进行同样的解释。我们进一步从经验学上表明,CARC提供更准确的系统提供更精确的、更精确的、更精确的、更精确的、更精确的SDNNAR概念与更精确的深度解释。