Influence functions approximate the "influences" of training data-points for test predictions and have a wide variety of applications. Despite the popularity, their computational cost does not scale well with model and training data size. We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time. We use k-Nearest Neighbors (kNN) to narrow the search space down to a subset of good candidate data points, identify the configurations that best balance the speed-quality trade-off in estimating the inverse Hessian-vector product, and introduce a fast parallel variant. Our proposed method achieves about 80X speedup while being highly correlated with the original influence values. With the availability of the fast influence functions, we demonstrate their usefulness in four applications. First, we examine whether influential data-points can "explain" test time behavior using the framework of simulatability. Second, we visualize the influence interactions between training and test data-points. Third, we show that we can correct model errors by additional fine-tuning on certain influential data-points, improving the accuracy of a trained MultiNLI model by 2.5% on the HANS dataset. Finally, we experiment with a similar setup but fine-tuning on datapoints not seen during training, improving the model accuracy by 2.8% and 1.7% on HANS and ANLI datasets respectively. Overall, our fast influence functions can be efficiently applied to large models and datasets, and our experiments demonstrate the potential of influence functions in model interpretation and correcting model errors. Code is available at https://github.com/salesforce/fast-influence-functions
翻译:影响函数的“ 影响 ” 接近用于测试预测的培训数据点的“ 影响 ”, 并有多种应用。 尽管受欢迎度很高, 但其计算成本与模型和培训数据大小不相称。 我们展示了快速IF, 这是一套简单的修改, 影响功能, 大大改进运行时间的功能。 我们使用 k- Nearest Neighbors (kNNN) 将搜索空间缩小到一组良好的候选数据点, 确定在估计逆向赫斯维特产品时最平衡速度质量交易的配置, 并引入一个快速平行变量。 我们提议的方法在与原始影响值高度关联的情况下实现了大约80X速度的加速。 随着快速影响功能的可用性, 我们在四个应用程序中展示了这些功能的有用性。 首先, 我们检查有影响力的数据点是否可以使用模缩缩框架来“ 解释” 测试时间行为。 其次, 我们可以将模型与测试NS 模型和测试数据点之间的相互作用进行视觉。 第三, 我们显示我们可以通过对某些有影响力的数据点进行进一步的微调来纠正模型错误, 大幅的解算算算算, 。