ProtoPNet and its follow-up variants (ProtoPNets) have attracted broad research interest for their intrinsic interpretability from prototypes and comparable accuracy to non-interpretable counterparts. However, it has been recently found that the interpretability of prototypes can be corrupted due to the semantic gap between similarity in latent space and that in input space. In this work, we make the first attempt to quantitatively evaluate the interpretability of prototype-based explanations, rather than solely qualitative evaluations by some visualization examples, which can be easily misled by cherry picks. To this end, we propose two evaluation metrics, termed consistency score and stability score, to evaluate the explanation consistency cross images and the explanation robustness against perturbations, both of which are essential for explanations taken into practice. Furthermore, we propose a shallow-deep feature alignment (SDFA) module and a score aggregation (SA) module to improve the interpretability of prototypes. We conduct systematical evaluation experiments and substantial discussions to uncover the interpretability of existing ProtoPNets. Experiments demonstrate that our method achieves significantly superior performance to the state-of-the-arts, under both the conventional qualitative evaluations and the proposed quantitative evaluations, in both accuracy and interpretability. Codes are available at https://github.com/hqhQAQ/EvalProtoPNet.
翻译:ProtoPNet及其后续变体(ProtoPnets)吸引了广泛的研究兴趣,从原型和可比较的准确性到不可解释的对应方,对原型的内在解释性产生了广泛的研究兴趣,然而,最近发现原型的可解释性可能由于潜在空间和输入空间的相似性之间的语义差距而腐蚀;在这项工作中,我们第一次试图从数量上评价原型解释的可解释性,而不是仅仅通过一些可视化的例子进行定性评价,这些例子很容易被樱桃摘取所误导。为此,我们建议了两个评价指标,称为一致性和稳定性评分,以评价解释一致性的跨图像和对扰动性的解释性,两者对于将解释付诸实践至关重要。此外,我们提出了一个浅度特征调整模块和一个评分汇总模块,以提高原型的可解释性。我们进行了系统化的评价试验和实质性讨论,以发现现有的ProtoPetets的可解释性。实验表明,我们采用的方法取得了显著优于Pro-P-competets的状态和定量评估。