Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model's prediction would be changed if a training instance were not used. In this paper, we propose an efficient method for estimating the influence. Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance. By switching between dropout masks, we can use sub-networks that learned or did not learn each training instance and estimate its influence. Through experiments with BERT and VGGNet on classification datasets, we demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
翻译:理解培训实例对神经网络模型的影响可以改善解释性。然而,评估影响是困难的,也是效率低下的。评估影响表明,如果不使用培训实例,模型的预测将如何改变。在本文中,我们提出一种有效方法来估计影响。我们的方法来自辍学,这个方法将零制成子网络,防止子网络学习每个培训实例。通过在辍学面具之间转换,我们可以使用学习或没有学习每个培训实例的子网络,并估计其影响。通过与BERT和VGGNet进行分类数据集实验,我们证明拟议方法可以捕捉培训影响,提高错误预测的可解释性,并清理培训数据集,以改进普遍性。