调整后梯度和调整后盐度图中的输入偏移 (Input Bias in Rectified Gradients and Modified Saliency Maps)

Interpretation and improvement of deep neural networks relies on better understanding of their underlying mechanisms. In particular, gradients of classes or concepts with respect to the input features (e.g., pixels in images) are often used as importance scores or estimators, which are visualized in saliency maps. Thus, a family of saliency methods provide an intuitive way to identify input features with substantial influences on classifications or latent concepts. Several modifications to conventional saliency maps, such as Rectified Gradients and Layer-wise Relevance Propagation (LRP), have been introduced to allegedly denoise and improve interpretability. While visually coherent in certain cases, Rectified Gradients and other modified saliency maps introduce a strong input bias (e.g., brightness in the RGB space) because of inappropriate uses of the input features. We demonstrate that dark areas of an input image are not highlighted by a saliency map using Rectified Gradients, even if it is relevant for the class or concept. Even in the scaled images, the input bias exists around an artificial point in color spectrum. Our modification, which simply eliminates multiplication with input features, removes this bias. This showcases how a visual criteria may not align with true explainability of deep learning models.

翻译：深神经网络的解读和改进取决于对其基本机制的更好理解。特别是,与输入特征(例如图像中的像素等)有关的类别或概念梯度常被用作重要评分或估计值,这些评分或估计值在突出的地图中可见。因此,一系列突出的方法提供了一种直观的方法来识别输入特征,对分类或潜在概念有重大影响。对传统显著地图的一些修改,如“校正梯度”和“从层到层的关联性促进(LRP),已经引入据称的“隐蔽”和“改进解释性”等。在某些情况中,“校正梯度”和其他经过修改的显著地图虽然具有视觉一致性,但由于输入特征的不当使用而呈现出强烈的输入偏差(例如,RGB空间中的亮度)。我们证明,一个输入图像的黑暗区域没有被一个使用“校正梯度”的突出地图所突出,即使它与该类别或概念有关。即使在缩放图像中,输入偏差也存在于一个人工点上。彩色谱中,尽管有视觉上的输入偏差,但这种输入偏差在视觉上的精确度也无法解释。