Convolutional neural networks are being increasingly used in critical systems, where ensuring their robustness and alignment is crucial. In this context, the field of explainable artificial intelligence has proposed the generation of high-level explanations through concept extraction. These methods detect whether a concept is present in an image, but are incapable of locating where. What is more, a fair comparison of approaches is difficult, as proper validation procedures are missing. To fill these gaps, we propose a novel method for automatic concept extraction and localization based on representations obtained through the pixel-wise aggregations of activation maps of CNNs. Further, we introduce a process for the validation of concept-extraction techniques based on synthetic datasets with pixel-wise annotations of their main components, reducing human intervention. Through extensive experimentation on both synthetic and real-world datasets, our method achieves better performance in comparison to state-of-the-art alternatives.
翻译:关键系统越来越多地使用进化神经网络,这些系统必须确保其稳健性和一致性。在这方面,可解释的人工智能领域建议通过概念提取产生高层次的解释。这些方法检测一个概念是否在图像中存在,但却无法定位。此外,由于缺少适当的验证程序,对方法进行公平比较是困难的。为了填补这些空白,我们提出了一个基于通过CNN启动地图像素组合获得的自动概念提取和本地化的新方法。此外,我们引入了一个基于合成数据集及其主要组成部分的像素说明的概念解析技术验证程序,以减少人类干预。通过对合成和现实世界数据集的广泛实验,我们的方法在与最先进的替代方法相比中取得了更好的表现。