The current standard approach for fine-tuning transformer-based language models includes a fixed number of training epochs and a linear learning rate schedule. In order to obtain a near-optimal model for the given downstream task, a search in optimization hyperparameter space is usually required. In particular, the number of training epochs needs to be adjusted to the dataset size. In this paper, we introduce adaptive fine-tuning, which is an alternative approach that uses early stopping and a custom learning rate schedule to dynamically adjust the number of training epochs to the dataset size. For the example use case of named entity recognition, we show that our approach not only makes hyperparameter search with respect to the number of training epochs redundant, but also leads to improved results in terms of performance, stability and efficiency. This holds true especially for small datasets, where we outperform the state-of-the-art fine-tuning method by a large margin.
翻译:目前基于变压器的微调变压器语言模型的标准方法包括固定数目的培训时代和线性学习进度表。为了获得一个近乎最佳的下游任务模型,通常需要搜索优化超参数空间。特别是,培训时代的数量需要调整以适应数据集的大小。在本文中,我们引入适应性微调,这是一种替代方法,使用早期停止和定制学习率时间表来动态调整与数据集大小相适应的培训时代的数量。例如,使用名称实体识别的例子,我们表明我们的方法不仅对培训时代的数量进行了超光度搜索,而且还导致在性能、稳定性和效率方面提高了效果。对于小数据集来说尤其如此,因为我们在其中大大超越了最先进的微调方法。