Methods based on class activation maps (CAM) provide a simple mechanism to interpret predictions of convolutional neural networks by using linear combinations of feature maps as saliency maps. By contrast, masking-based methods optimize a saliency map directly in the image space or learn it by training another network on additional data. In this work we introduce Opti-CAM, combining ideas from CAM-based and masking-based approaches. Our saliency map is a linear combination of feature maps, where weights are optimized per image such that the logit of the masked image for a given class is maximized. We also fix a fundamental flaw in two of the most common evaluation metrics of attribution methods. On several datasets, Opti-CAM largely outperforms other CAM-based approaches according to the most relevant classification metrics. We provide empirical evidence supporting that localization and classifier interpretability are not necessarily aligned.
翻译:以阶级激活地图为基础的方法提供了一种简单的机制,通过将地貌图的线性组合作为突出的地图来解释对卷变神经网络的预测。相反,以遮罩为基础的方法直接优化图像空间中的突出地图,或者通过培训其他数据网络来学习该图。在这项工作中,我们引入了Opti-CAM,将基于CAM和基于遮盖的方法的各种想法结合起来。我们的突出的地图是地貌图的线性组合,每个图像的重量得到优化,这样可以使某一类的遮盖图像的登录最大化。我们还在两种最常用的归属方法评价指标中修补了一个基本缺陷。在几个数据集中,Opti-CAM基本上比其他基于CAM的方法更符合最相关的分类指标。我们提供了经验证据,证明本地化和分类可解释性不一定一致。