EDD: 改进模型和解释协调的由解释驱动的数据增加数据 (EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment)

Recent years have seen the introduction of a range of methods for post-hoc explainability of image classifier predictions. However, these post-hoc explanations may not always align perfectly with classifier predictions, which poses a significant challenge when attempting to debug models based on such explanations. To this end, we seek a methodology that can improve alignment between model predictions and explanation method that is both agnostic to the model and explanation classes and which does not require ground truth explanations. We achieve this through a novel explanation-driven data augmentation (EDDA) method that augments the training data with occlusions of existing data stemming from model-explanations; this is based on the simple motivating principle that occluding salient regions for the model prediction should decrease the model confidence in the prediction, while occluding non-salient regions should not change the prediction -- if the model and explainer are aligned. To verify that this augmentation method improves model and explainer alignment, we evaluate the methodology on a variety of datasets, image classification models, and explanation methods. We verify in all cases that our explanation-driven data augmentation method improves alignment of the model and explanation in comparison to no data augmentation and non-explanation driven data augmentation methods. In conclusion, this approach provides a novel model- and explainer-agnostic methodology for improving alignment between model predictions and explanations, which we see as a critical step forward for practical deployment and debugging of image classification models.

翻译：近些年来,人们采用了一系列方法来解释图像分类的预测,但是,这些事后解释可能并不总是完全符合分类的预测,这在试图根据这些解释调试模型时构成重大挑战。为此,我们寻求一种方法,可以改进模型预测和解释方法之间的一致,这种方法既对模型和解释类别不可知,又不需要地面真相解释。我们通过一种新的解释驱动数据强化方法(EDDA)实现这一点,该方法将来自模型规划的现有数据包含在培训数据中;这是基于简单的激励原则,即模型预测的显要区域应当降低模型对预测的信心,而如果模型和解释类别相互一致,则不使非重点区域不改变预测。为了核实这种增强方法改进模型和解释步骤调整模式,我们评估关于各种数据集、图像分类模型分类模型和解释方法的方法。我们在任何情况下都核查我们的解释驱动的图像分类方法,即模型预测的突出区域应当降低模型对预测的信心,在模型和数据升级过程中,我们不以更新数据更新的方法和结论。