Prototypical methods have recently gained a lot of attention due to their intrinsic interpretable nature, which is obtained through the prototypes. With growing use cases of model reuse and distillation, there is a need to also study transfer of interpretability from one model to another. We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to another via knowledge distillation. Our approach aims to add interpretability to the "dark" knowledge transferred from the teacher to the shallower student model. We propose two novel losses: "Global Explanation" loss and "Patch-Prototype Correspondence" loss to facilitate such a transfer. Global Explanation loss forces the student prototypes to be close to teacher prototypes, and Patch-Prototype Correspondence loss enforces the local representations of the student to be similar to that of the teacher. Further, we propose three novel metrics to evaluate the student's proximity to the teacher as measures of interpretability transfer in our settings. We qualitatively and quantitatively demonstrate the effectiveness of our method on CUB-200-2011 and Stanford Cars datasets. Our experiments show that the proposed method indeed achieves interpretability transfer from teacher to student while simultaneously exhibiting competitive performance.
翻译:原型方法最近因其内在可解释性而引起人们的极大关注,其内在解释性质是通过原型获得的。随着模型再利用和蒸馏的日益应用,还需要研究可解释性从一个模型转移到另一个模型。我们提出了Proto2Proto,这是通过知识蒸馏将一个原型部分网络的可解释性转移给另一个网络的一种新颖方法。我们的方法旨在增加从教师到更浅的学生模型的“暗”知识的可解释性。我们提出了两个新的损失:“全球解释”损失和“批次-原型通信”损失,以促进这种转移。全球解释损失迫使学生原型接近教师原型,而Patch-Proto对应性损失则使学生的当地表现与教师相似。此外,我们提出了三个新指标,用以评估学生与教师的距离,作为我们环境中可解释性转移的衡量标准。我们从质量和数量上展示了我们在CUB-200-2011和斯坦福学生数据库中的方法的有效性,同时展示了从教师的竞争力。我们提出的方法确实实现了从竞争性转让。