Current machine learning models have shown high efficiency in solving a wide variety of real-world problems. However, their black box character poses a major challenge for the understanding and traceability of the underlying decision-making strategies. As a remedy, many post-hoc explanation and self-explanatory methods have been developed to interpret the models' behavior. These methods, in addition, enable the identification of artifacts that can be learned by the model as class-relevant features. In this work, we provide a detailed case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts. Accordingly, we identify the main drawbacks of ProtoPNet, especially, its coarse and spatially imprecise explanations. We address these limitations by introducing Prototypical Relevance Propagation (PRP), a novel method for generating more precise model-aware explanations. Furthermore, in order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images using the PRP explanations, thereby suppressing the potential artifact learning in the models.
翻译:目前机器学习模型在解决各种现实世界问题方面的效率很高,然而,它们的黑盒特性对基本决策战略的理解和可追溯性构成重大挑战。作为一种补救办法,已经制定了许多事后解释和自我解释的方法来解释模型的行为。此外,这些方法还使得能够确定模型可以学习的文物,作为与阶级有关的特征。在这项工作中,我们提供了对自我解释网络(ProtoPNet)进行的详细案例研究,并展示了各种文物。因此,我们确定了ProtoPNet的主要缺点,特别是其粗糙和空间不精确的解释。我们通过采用Protomical Internity propagation(PRP)来克服这些局限性,这是产生更精确的模型认知解释的一种新颖方法。此外,为了获得干净的数据集,我们提议使用多视角组合战略,利用PRP解释将艺术品图像分离出来,从而抑制模型中潜在的艺术品学习。