In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
翻译:在过去几年中,许多研究都试图解释深度学习模型的预测。然而,很少有方法被提出来验证这些解释的准确性或信念。最近,影响函数被证明是脆弱的一种方法,它是一种近似衡量衍生于leave-one-out训练的对损失函数的影响的方法。对于其脆弱性的提出的原因仍不清楚。虽然先前的工作建议使用正则化来提高鲁棒性,但这并不适用于所有情况。在这项工作中,我们试图调查先前工作中进行的实验,以便了解影响函数脆弱性的潜在机制。首先,我们使用影响函数的凸性假设的程序验证影响函数。然后,我们放松这些假设,并使用更深的模型和更复杂的数据集研究非凸性的影响。在此,我们分析用于验证影响函数的关键度量和程序。我们的结果表明,验证程序可能导致观察到的脆弱性。