Explainable artificial intelligence has rapidly emerged since lawmakers have started requiring interpretable models for safety-critical domains. Concept-based neural networks have arisen as explainable-by-design methods as they leverage human-understandable symbols (i.e. concepts) to predict class memberships. However, most of these approaches focus on the identification of the most relevant concepts but do not provide concise, formal explanations of how such concepts are leveraged by the classifier to make predictions. In this paper, we propose a novel end-to-end differentiable approach enabling the extraction of logic explanations from neural networks using the formalism of First-Order Logic. The method relies on an entropy-based criterion which automatically identifies the most relevant concepts. We consider four different case studies to demonstrate that: (i) this entropy-based criterion enables the distillation of concise logic explanations in safety-critical domains from clinical data to computer vision; (ii) the proposed approach outperforms state-of-the-art white-box models in terms of classification accuracy.
翻译:自立法者开始要求安全关键领域的可解释模型以来,迅速出现了可解释的人工智能;基于概念的神经网络作为逐个设计的方法出现,因为它们利用人类可理解的符号(即概念)预测阶级成员,但是,这些方法大多侧重于确定最相关的概念,但没有对这种概念如何被分类者利用来作出预测提供简明的正式解释;在本文件中,我们提议一种新型的端到端的可解释方法,以便利用一极逻辑的正规主义从神经网络中提取逻辑解释。该方法依赖于基于英特罗比的标准,自动确定最相关的概念。我们认为,四种不同的案例研究表明:(一) 这种基于英特罗比的标准能够使安全关键领域从临床数据到计算机愿景的精确逻辑解释得以提炼。 (二) 拟议的方法在分类准确性方面超越了最先进的白箱模型。