自动冻结:自动冻结模型块以加速微调 (AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning) - 专知论文

会员服务 ·

0

可约的 · 层 · Performer · MoDELS · 模型评估 ·

2021 年 2 月 2 日

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

翻译：自动冻结:自动冻结模型块以加速微调

Yuhan Liu,Saurabh Agarwal,Shivaram Venkataraman

With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine-tuning models pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours when using GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all layers but the last layer, we find that such static approaches lead to reduced accuracy. We propose, AutoFreeze, a system that uses an adaptive approach to choose which layers are trained and show how this can accelerate model fine-tuning while preserving accuracy. We also develop mechanisms to enable efficient caching of intermediate activations which can reduce the forward computation time when performing fine-tuning. Our evaluation on fourNLP tasks shows that AutoFreeze, with caching enabled, can improve fine-tuning performance by up to 2.55x.

翻译：随着机器学习(ML)的迅速采用,一些领域现在采用在大量数据中预先培训的微调模型的方法。然而,我们的实验显示,在使用GPU时,即使是对BERT等模型的微调也可能需要许多小时。虽然先前的工作提议限制微调的层数,例如冻结所有层,但最后一层,我们发现这种静态方法导致精确度降低。我们提议,AutoFreze,一个采用适应性方法选择哪些层受过训练的系统,并显示如何在保持准确性的同时加速微调模型。我们还开发各种机制,使中间激活能够有效缓存,从而在进行微调时缩短前期计算时间。我们对四层NLP任务的评估显示,AutoFreteze,有了缓存功能,可以提高微调性能达2.55x。

0

相关内容

可约的

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

5+阅读 · 2017年11月22日

Self-Supervised Training Enhances Online Continual Learning

Self-Supervised Training Enhances Online Continual Learning

Arxiv

0+阅读 · 2021年3月25日

DNN Quantization with Attention

Arxiv

0+阅读 · 2021年3月24日

A High-order Tuner for Accelerated Learning and Control

Arxiv

0+阅读 · 2021年3月23日

AdderSR: Towards Energy Efficient Image Super-Resolution

AdderSR: Towards Energy Efficient Image Super-Resolution

Arxiv

9+阅读 · 2020年9月18日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Rethinking ImageNet Pre-training

Arxiv

8+阅读 · 2018年11月21日

Quantizing deep convolutional networks for efficient inference: A whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

Arxiv

6+阅读 · 2018年6月21日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月17日

Fast Feature Extraction with CNNs with Pooling Layers

Arxiv

5+阅读 · 2018年5月8日

VIP会员

文章信息

相关主题

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】行动，规划与学习，622页pdf

美军坦克部队反无人机新策略：主炮轰击方案

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

数据质量维度的实践展开：一项综述

相关资讯

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

5+阅读 · 2017年11月22日

相关论文

Self-Supervised Training Enhances Online Continual Learning

Self-Supervised Training Enhances Online Continual Learning

Arxiv

0+阅读 · 2021年3月25日

DNN Quantization with Attention

Arxiv

0+阅读 · 2021年3月24日

A High-order Tuner for Accelerated Learning and Control

Arxiv

0+阅读 · 2021年3月23日

AdderSR: Towards Energy Efficient Image Super-Resolution

AdderSR: Towards Energy Efficient Image Super-Resolution

Arxiv

9+阅读 · 2020年9月18日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Rethinking ImageNet Pre-training

Arxiv

8+阅读 · 2018年11月21日

Quantizing deep convolutional networks for efficient inference: A whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

Arxiv

6+阅读 · 2018年6月21日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月17日

Fast Feature Extraction with CNNs with Pooling Layers

Arxiv

5+阅读 · 2018年5月8日

微信扫码咨询专知VIP会员