It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.
翻译:简略地总结模型参数和培训数据的重要属性,以便日后在不对整个数据集进行储存和/或迭接的情况下使用这些参数和培训数据。作为一个具体的例子,我们考虑估计一组培训的功能空间距离(FSD),即两个神经网络产出之间的平均差异。我们提出一个线性激活功能TRick(LAFTR),并且为ReLU神经网络向FSD提供有效的近似值。关键的想法是将结构作为线性网络加以近似,并配有随机格。尽管我们的方法只要求每单元有一个参数,但比其他具有较大内存要求的参数近似值相容。应用到持续学习,我们的准近似值与最先进的非参数近似值具有竞争力,这需要保存许多培训实例。此外,我们展示了它在准确估计影响功能和探测错误的示例方面的效力,而不对整个数据集进行昂贵的迭代。