Identifying spurious correlations learned by a trained model is at the core of refining a trained model and building a trustworthy model. We present a simple method to identify spurious correlations that have been learned by a model trained for image classification problems. We apply image-level perturbations and monitor changes in certainties of predictions made using the trained model. We demonstrate this approach using an image classification dataset that contains images with synthetically generated spurious regions and show that the trained model was overdependent on spurious regions. Moreover, we remove the learned spurious correlations with an explanation based learning approach.
翻译:通过经过训练的模型查明假相关是完善经过训练的模型和建立一个值得信赖的模型的核心。我们提出了一个简单的方法来查明通过经过训练的图像分类问题模型所学的假相关关系。我们应用图像水平的扰动,并监测使用经过训练的模型所作的预测的某些方面的变化。我们使用图像分类数据集来证明这一方法,该数据集包含合成产生的虚假区域的图像,并表明经过训练的模型过分依赖虚假的区域。此外,我们用基于解释的学习方法来消除所学的虚假对应关系。