The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .
翻译:数据归因的目标是将模型预测回溯到训练数据。尽管有一系列关于此目标的工作,现有的数据归因方法往往会迫使用户在计算可处理性和有效性之间做出选择。也就是说,计算可处理的方法可能会在非凸设置(例如,在深度神经网络的上下文中)中难以准确地归因模型预测,而在这种情况下有效的方法需要训练成千上万个模型,这使它们在大型模型或数据集上不实用。在本研究中,我们介绍了TRAK(随机投影之后的跟踪),一种针对大规模可微模型既有效又计算可处理的数据归因方法。通过仅利用少量训练过的模型,TRAK可以匹配需要训练成千上万个模型的归因方法的性能。我们演示了TRAK在各种领域和规模上的效用:在ImageNet上训练的图像分类器、视觉语言模型(CLIP)以及语言模型(BERT和mT5)。我们提供使用TRAK的代码(以及复现我们的工作)的代码在https://github.com/MadryLab/trak中。