Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior based on the assumption that complete concept scores are sufficient statistics of the model prediction. Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations. To define an importance score for each discovered concept, we adapt game-theoretic notions to aggregate over sets and propose ConceptSHAP. Via proposed metrics and user studies, on a synthetic dataset with apriori-known concept explanations, as well as on real-world image and language datasets, we validate the effectiveness of our method in finding concepts that are both complete in explaining the decisions and interpretable. (The code is released at https://github.com/chihkuanyeh/concept_exp)
翻译:对高层决定的人类解释往往以这些决定所依据的关键概念为根据。在本文件中,我们研究了深神经网络(DNNs)基于概念的解释性。首先,我们界定了完整性概念,它量化了特定一套概念在解释模型预测行为时的足够程度,所依据的假设是,完整的概念分数是模型预测的充分统计数据。接着,我们提出了一个概念发现方法,旨在推断出一套完全的、得到额外鼓励的、可解释的概念,它解决了现有概念解释方法的局限性。为了界定每个发现的概念的重要性分,我们调整游戏理论概念,以综合各组,并提出概念SHAP。Via建议的指标和用户研究,在具有已知概念解释的合成数据集上,以及在真实世界图像和语言数据集上,我们验证了我们找到既能够完整解释决定又可以解释的概念的方法的有效性。 (该代码公布在https://github.com/chihkuanyeh/conception_extion) (该代码在https://githrub.com/chihanyeh/contradefex)。