In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
翻译:还原影响函数的脆弱性
在过去几年中,许多研究都致力于解释深度学习模型的预测。然而,很少有方法被提出来验证这些解释的准确性或忠诚度。最近,影响函数被证明是一种近似计算一次离开训练会对损失函数产生的影响的方法,但其脆弱性也备受瞩目。之前的研究建议使用正则化方法提高影响函数的鲁棒性,但这并不适用于所有情况。在本文中,我们旨在调查之前的实验,以便理解影响函数脆弱性的潜在机制。首先,我们使用文献中的程序验证影响函数在满足其凸性假设的条件下的有效性。然后,我们放松这些假设,并使用更深层次和更复杂的数据集来研究非凸性的影响。在此过程中,我们分析验证影响函数所使用的关键指标和程序。我们的结果表明,验证程序可能导致观察到的脆弱性。