Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form. This deficiency remains a key challenge when applying CNNs in important domains. Recent work for explanations through feature importance of approximate linear models has moved from input-level features (pixels or segments) to features from mid-layer feature maps in the form of concept activation vectors (CAVs). CAVs contain concept-level information and could be learnt via clustering. In this work, we rethink the ACE algorithm of Ghorbani et al., proposing an alternative inevitable concept-based explanation (ICE) framework to overcome its shortcomings. Based on the requirements of fidelity (approximate models to target models) and interpretability (being meaningful to people), we design measurements and evaluate a range of matrix factorization methods with our framework. We find that \emph{non-negative concept activation vectors} (NCAVs) from non-negative matrix factorization provide superior performance in interpretability and fidelity based on computational and human subject experiments. Our framework provides both local and global concept-level explanations for pre-trained CNN models.
翻译:计算机视觉的进化神经网络(CNN)模型非常强大,但缺乏最基本的解释。在应用CNN在重要领域使用CNN时,这一缺陷仍然是一个关键的挑战。最近通过近似线性模型的显著重要性进行解释的工作已经从输入层面的特征(像素或片段)转变为从中层特征图中以概念激活矢量(CAVs)形式出现的特征。CAV包含概念层面的信息,可以通过集群学习。在这项工作中,我们重新思考Ghorbani等人的ACE算法,提出了克服其缺陷的基于概念的不可避免的解释框架。根据对忠诚的要求(对目标模型的近似模型)和可解释性(对人有意义的),我们设计测量和评估了与我们框架相关的一系列矩阵要素化方法。我们发现,来自非否定性矩阵要素化的NCAVS提供了基于计算和人类实验的更高可解释性和真实性。我们的框架为培训前模式提供了当地和全球概念层面的解释。