When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models.
翻译:当我们面临具有挑战性的图像分类任务时,我们常常通过解剖图像来解释我们的推理,指出一个或另一个类的原型方面。每个类的越来越多的证据有助于我们作出最终决定。在这项工作中,我们引入了一个深网络结构 -- -- 原型部分网络(ProtoPNet),其原因相似:网络通过寻找原型部分来解析图像,并结合原型的证据来进行最终分类。因此,模型的理由与鸟类学家、医生和其他人如何解决具有挑战性的图像分类任务的方式在质量上相似。网络只使用图像级标签来进行没有部分图像说明的培训。我们在CUB-200-2011数据集和斯坦福汽车数据集上展示了我们的方法。我们的实验显示,ProtoPNet可以与类似的非互换的对应方实现相似的准确性,而当几个ProtoPnetts被合并到一个更大的网络时,它可以实现与一些最优秀的深层模型相同的准确性。此外,ProtoPNet提供了另一种解释能力水平。