Recent years have seen the introduction of a range of methods for post-hoc explainability of image classifier predictions. However, these post-hoc explanations may not always be faithful to classifier predictions, which poses a significant challenge when attempting to debug models based on such explanations. To this end, we seek a methodology that can improve the faithfulness of an explanation method with respect to model predictions which does not require ground truth explanations. We achieve this through a novel explanation-driven data augmentation (EDDA) technique that augments the training data with occlusions inferred from model explanations; this is based on the simple motivating principle that \emph{if} the explainer is faithful to the model \emph{then} occluding salient regions for the model prediction should decrease the model confidence in the prediction, while occluding non-salient regions should not change the prediction. To verify that the proposed augmentation method has the potential to improve faithfulness, we evaluate EDDA using a variety of datasets and classification models. We demonstrate empirically that our approach leads to a significant increase of faithfulness, which can facilitate better debugging and successful deployment of image classification models in real-world applications.
翻译:近些年来,人们采用了一系列方法来解释图像分类器预测的超常性解释。然而,这些超常性解释可能并不总是忠实于分类器预测,这在试图根据这种解释调试模型时构成重大挑战。为此,我们寻求一种方法,可以提高模型预测解释方法的准确性,而模型预测并不需要地面真相解释。我们通过一种新的解释驱动的数据增强技术(EDDA)实现这一目标,这种技术通过从模型解释中推断出的分层来增加培训数据;这是基于简单的激励原则,即解释者忠实于分类器预测,而模型预测的突出区域应当降低模型对预测的信心,而隐蔽的非高度区域不应改变预测。为了核实拟议的扩增方法有可能提高忠诚性,我们使用各种数据集和分类模型来评估EDADA。我们从经验上表明,我们的方法可以显著提高忠实性,有助于更好地降低世界形象的分类应用和成功部署。