With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine-tuning models pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours when using GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all layers but the last layer, we find that such static approaches lead to reduced accuracy. We propose, AutoFreeze, a system that uses an adaptive approach to choose which layers are trained and show how this can accelerate model fine-tuning while preserving accuracy. We also develop mechanisms to enable efficient caching of intermediate activations which can reduce the forward computation time when performing fine-tuning. Our evaluation on fourNLP tasks shows that AutoFreeze, with caching enabled, can improve fine-tuning performance by up to 2.55x.
翻译:随着机器学习(ML)的迅速采用,一些领域现在采用在大量数据中预先培训的微调模型的方法。然而,我们的实验显示,在使用GPU时,即使是对BERT等模型的微调也可能需要许多小时。虽然先前的工作提议限制微调的层数,例如冻结所有层,但最后一层,我们发现这种静态方法导致精确度降低。我们提议,AutoFreze,一个采用适应性方法选择哪些层受过训练的系统,并显示如何在保持准确性的同时加速微调模型。我们还开发各种机制,使中间激活能够有效缓存,从而在进行微调时缩短前期计算时间。我们对四层NLP任务的评估显示,AutoFreteze,有了缓存功能,可以提高微调性能达2.55x。