As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by training examples are often redundant, we design an algorithm that filters the examples in a streaming fashion. Our key techniques are two: (1) automatically determine a training loss threshold for skipping the backward propagation; and (2) maintain a meta predictor for further skipping the forward propagation. Incarnated as a three-stage process, on a diverse set of benchmarks our algorithm reduces the required training examples by up to 5$\times$ while only seeing minor degradation on average. Our method is effective even for as few as one training epoch, where each training example is encountered once. It is simple to implement and is compatible with the existing model finetuning optimizations such as layer freezing.
翻译:由于模型微调是现代NLP的核心,我们设定了最大限度地提高效率的模式。受培训范例的激励,我们往往多余,我们设计了一种算法,以流体方式过滤实例。我们的关键技术是两个:(1) 自动确定跳过后向传播的培训损失阈值;(2) 维持一个元预测器,以进一步跳过前向传播。作为一个三阶段过程,我们算法在一套不同的基准上将所需培训范例减少多达5美元,而只看到平均轻微的退化。我们的方法即使在一个培训时代也有效,每个培训实例一次都遇到一次。实施起来很简单,并且符合现有的模型微调优化,如层冻结。