We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transformer models with several hundreds of millions of parameters. We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples. Our code will be available at https://github.com/google-research/jax-influence.
翻译:我们提出并分析一种新办法,以加快基于Arnoldi重复的赫塞因反向计算。有了这一改进,我们最了解的情况是,我们首次成功实施影响功能,将规模扩大到具有数亿参数的全尺寸(语言和愿景)变异模型。我们用数千万至一亿个培训实例评估我们关于图像分类和顺序任务的方法。我们的代码将在https://github.com/gogle-research/jax-impact上查阅。