贪婪层谨慎:变形模型的参考时间减少 (Greedy Layer Pruning: Decreasing Inference Time of Transformer Models) - 专知论文

会员服务 ·

0

剪枝 · Transformer模型 · Performer · MoDELS · 蒸馏 ·

2021 年 5 月 31 日

Greedy Layer Pruning: Decreasing Inference Time of Transformer Models

翻译：贪婪层谨慎:变形模型的参考时间减少

David Peer,Sebastian Stabinger,Stefan Engl,Antonio Rodriguez-Sanchez

Fine-tuning transformer models after unsupervised pre-training reaches a very high performance on many different NLP tasks. Unfortunately, transformers suffer from long inference times which greatly increases costs in production and is a limiting factor for the deployment into embedded devices. One possible solution is to use knowledge distillation, which solves this problem by transferring information from large teacher models to smaller student models, but as it needs an additional expensive pre-training phase, this solution is computationally expensive and can be financially prohibitive for smaller academic research groups. Another solution is to use layer-wise pruning methods, which reach high compression rates for transformer models and avoids the computational load of the pre-training distillation stage. The price to pay is that the performance of layer-wise pruning algorithms is not on par with state-of-the-art knowledge distillation methods. In this paper, greedy layer pruning (GLP) is introduced to (1) outperform current state-of-the-art for layer-wise pruning (2) close the performance gap when compared to knowledge distillation, while (3) using only a modest budget. More precisely, with the methodology presented it is possible to prune and evaluate competitive models on the whole GLUE benchmark with a budget of just $\$300$. Our source code is available on https://github.com/deepopinion/greedy-layer-pruning.

翻译：在未经监督的预培训阶段之后,微调变压器模型在很多不同的NLP任务上达到非常高的性能。不幸的是,变压器经历了长时间的推算时间,极大地提高了生产成本,并且是内嵌设备部署的一个限制因素。一个可能的解决办法是使用知识蒸馏法,通过将信息从大型教师模型转移到较小的学生模型来解决该问题,但是由于它需要额外的昂贵的预培训阶段,这种解决办法在计算上是昂贵的,对较小的学术研究团体来说在财政上可能令人望而却步。另一个解决办法是使用从层到层到层的调整方法,这些方法达到变压器模型的高压缩率,并避免了培训前蒸馏阶段的计算负荷。要付出的代价是,从层到精细的理算算算算法的性能不等同于最先进的知识蒸馏方法。在本文中,贪婪的层平流(GLP) 引入了(1) 低于当前的水平,用于分层/调整的理算法;(2) 在与知识蒸馏中比较时,缩小业绩差距,而只是使用一种微级的GL 和整个预算的源。可能展示一种小的基。

0

相关内容

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

专知会员服务

70+阅读 · 2020年7月1日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知会员服务

45+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

已删除

将门创投

4+阅读 · 2019年4月1日

A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues

Arxiv

0+阅读 · 2021年7月19日

A High-Performance Adaptive Quantization Approach for Edge CNN Applications

Arxiv

0+阅读 · 2021年7月18日

PVTv2: Improved Baselines with Pyramid Vision Transformer

Arxiv

0+阅读 · 2021年7月17日

Accuracy Prediction with Non-neural Model for Neural Architecture Search

Arxiv

0+阅读 · 2021年7月16日

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

Arxiv

8+阅读 · 2021年5月30日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

专知会员服务

70+阅读 · 2020年7月1日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知会员服务

45+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

已删除

将门创投

4+阅读 · 2019年4月1日

相关论文

A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues

Arxiv

0+阅读 · 2021年7月19日

A High-Performance Adaptive Quantization Approach for Edge CNN Applications

Arxiv

0+阅读 · 2021年7月18日

PVTv2: Improved Baselines with Pyramid Vision Transformer

Arxiv

0+阅读 · 2021年7月17日

Accuracy Prediction with Non-neural Model for Neural Architecture Search

Arxiv

0+阅读 · 2021年7月16日

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

Arxiv

8+阅读 · 2021年5月30日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

微信扫码咨询专知VIP会员