Deep neural networks that yield human interpretable decisions by architectural design have lately become an increasingly popular alternative to post hoc interpretation of traditional black-box models. Among these networks, the arguably most widespread approach is so-called prototype learning, where similarities to learned latent prototypes serve as the basis of classifying an unseen data point. In this work, we point to an important shortcoming of such approaches. Namely, there is a semantic gap between similarity in latent space and similarity in input space, which can corrupt interpretability. We design two experiments that exemplify this issue on the so-called ProtoPNet. Specifically, we find that this network's interpretability mechanism can be led astray by intentionally crafted or even JPEG compression artefacts, which can produce incomprehensible decisions. We argue that practitioners ought to have this shortcoming in mind when deploying prototype-based models in practice.
翻译:导致建筑设计中人类可解释决定的深神经网络最近已成为对传统黑盒模型进行临时解释后的一种日益流行的替代方法。在这些网络中,可以说最普遍的方法是所谓的原型学习,即与所学潜在原型的相似性成为对一个不可见数据点进行分类的基础。在这项工作中,我们指出了这类方法的一个重要缺陷。也就是说,潜空间的相似性和输入空间的相似性之间存在语义上的差距,这可能会腐蚀可解释性。我们设计了两个实验,在所谓的ProtoPNet上证明了这一问题。具体地说,我们发现这个网络的可解释性机制可能会被有意制造甚至JPEG压缩的工艺品所误导,这些工艺品可能产生无法理解的决定。我们主张,从业人员在实际应用原型模型时,应该铭记这一缺陷。