Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. The complex computation behind their reasoning is not sufficiently human-understandable to develop trust. External explainer methods have tried to interpret the network decisions in a human-understandable way, but they are accused of fallacies due to their assumptions and simplifications. On the other side, the inherent self-interpretability of models, while being more robust to the mentioned fallacies, cannot be applied to the already trained models. In this work, we propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability and the possibility for knowledge injection while improving the model's performance. Moreover, several weakly-supervised knowledge injection methodologies are provided to enhance the process of training. We verified our claims by evaluating several LAP-extended models on three different datasets, including Imagenet. The proposed framework offers more valid human-understandable and more faithful-to-the-model interpretations than the commonly used white-box explainer methods.
翻译:尽管深层神经神经网络最先进的表现是深层神经神经网络,但它们很容易在不可见的情况下出现偏向和故障。 其推理背后的复杂计算不足以使人理解到信任的发展。 外部解释者的方法试图以人类可以理解的方式解释网络决定,但是由于他们的假设和简化,他们被指控有谬误。 另一方面,模型的内在自我解释能力,虽然对于上述误差比较强,但不能适用于已经受过训练的模型。 在这项工作中,我们提议了一个新的基于关注的集合层,称为“本地注意集合”(LAP),在改进模型的性能的同时实现自我解释和知识注入的可能性。此外,提供了几种薄弱的、超强的知识注入方法,以加强培训过程。我们通过对包括图像网在内的三个不同数据集的若干LAP扩展模型进行评估来核实我们的要求。 拟议的框架提供了比常用的白箱解释方法更为有效的人能理解和更加忠实的模型解释。