Interpreting and explaining the behavior of deep neural networks is critical for many tasks. Explainable AI provides a way to address this challenge, mostly by providing per-pixel relevance to the decision. Yet, interpreting such explanations may require expert knowledge. Some recent attempts toward interpretability adopt a concept-based framework, giving a higher-level relationship between some concepts and model decisions. This paper proposes Bottleneck Concept Learner (BotCL), which represents an image solely by the presence/absence of concepts learned through training over the target task without explicit supervision over the concepts. It uses self-supervision and tailored regularizers so that learned concepts can be human-understandable. Using some image classification tasks as our testbed, we demonstrate BotCL's potential to rebuild neural networks for better interpretability. Code is available at https://github.com/wbw520/BotCL and a simple demo is available at https://botcl.liangzhili.com/.
翻译:解释和解释深度神经网络的行为对许多任务至关重要。可解释的人工智能提供了一种解决这一挑战的方法,主要是通过针对决策的每个像素提供关联性。然而,解释这些解释可能需要专业知识。最近一些试图实现可解释性的尝试采用基于概念的框架,提供了一种更高级别的方法来表达一些概念与模型决策之间的关系。本文提出了瓶颈概念学习器(BotCL),它仅通过在目标任务培训期间学习的概念的存在/不存在来表示图像,而无需对概念进行明确的监督。它使用自我监督和定制的正则化器,以便可以人类理解地学习概念。使用一些图像分类任务作为我们的测试平台,我们展示了BotCL重建神经网络以获得更好的可解释性的潜力。代码可在https://github.com/wbw520/BotCL上找到,简单的演示可在https://botcl.liangzhili.com/上找到。