Understanding the black-box prediction for neural networks is challenging. To achieve this, early studies have designed influence function (IF) to measure the effect of removing a single training point on neural networks. However, the classic implicit Hessian-vector product (IHVP) method for calculating IF is fragile, and theoretical analysis of IF in the context of neural networks is still lacking. To this end, we utilize the neural tangent kernel (NTK) theory to calculate IF for the neural network trained with regularized mean-square loss, and prove that the approximation error can be arbitrarily small when the width is sufficiently large for two-layer ReLU networks. We analyze the error bound for the classic IHVP method in the over-parameterized regime to understand when and why it fails or not. In detail, our theoretical analysis reveals that (1) the accuracy of IHVP depends on the regularization term, and is pretty low under weak regularization; (2) the accuracy of IHVP has a significant correlation with the probability density of corresponding training points. We further borrow the theory from NTK to understand the IFs better, including quantifying the complexity for influential samples and depicting the variation of IFs during the training dynamics. Numerical experiments on real-world data confirm our theoretical results and demonstrate our findings.
翻译:为实现这一目标,早期研究设计了影响功能(IF),以测量去除神经网络单一培训点的效果。然而,典型的Hissian-矢量器产品(IHVP)用于计算IF的经典隐含方法很脆弱,在神经网络背景下对IF的理论分析仍然缺乏。为此,我们利用神经中枢内核理论(NTK)来计算受过常规平均水平损失训练的神经网络的IF, 并证明当双层RELU网络的宽度足够大时,近似误差会任意缩小。我们分析了超分度系统中经典IHVP方法的误差,以了解该方法何时和为何失败。我们的详细理论分析表明:(1) IHVP的准确性取决于正规化期限,而且非常低;(2) IHVP的准确性与相应培训点的概率密度有着显著的关联。我们进一步借用了NTK的理论,以便更好地了解IFS的精确性实验结果,包括量化我们IF的模型变异性。