SmartBERT:促进加速BERT推断的动态早期提前退出机制</s> (SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference) - 专知论文

会员服务 ·

0

层 · BERT · 推断 · 模型评估 · 可约的 ·

2023 年 3 月 16 日

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

翻译：SmartBERT:促进加速BERT推断的动态早期提前退出机制

Boren Hu,Yun Zhu,Jiacheng Li,Siliang Tang

Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the consistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3x computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets like RTE and WNLI, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.

翻译：早期退出已被证明可以提高预先培训的语言模型(如BERT)的推断速度。然而,所有样本都必须在早期退出之前经过连续的各级,而更复杂的样本通常要经过更多的层,而这些层仍然是多余的计算。在本文中,我们提议一种新的动态早期退出,同时为BERT的推断层跳过一个叫SmartBERT的层层,在BERT的每层中增加一个跳开的门和一个下方操作员。智能BERT可以适应性地跳过一些层,适应性地选择是否退出。此外,我们提议跨层对比性学习并将其纳入我们的培训阶段,以提升中间层和分类器,从而有利于早期退出。为了在培训和推断阶段之间保持对跳出门的一致使用,我们在培训阶段中建议了一个硬重力机制。我们在GLUE基准的八个分类数据集上进行了实验。实验结果表明,SmartBERT的计算结果与BERT和我们的方法相比,在效率与准确性方法上都优于先前的方法。此外,在一些复杂的数据设置中,例如RTE和WNLI的计算中,我们很难证明,我们正在对基本的计算机制的早期进行。</s>

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

DPH1基因新可变剪接体DPH1-IR在鼻咽癌中致癌功能及机制

国家自然科学基金

0+阅读 · 2014年12月31日

常染色体隐性遗传小脑性共济失调新的致病基因CAX的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于张量模型的交通数据重建方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

云杉属树种水分运输关键功能性状适应性进化研究

国家自然科学基金

0+阅读 · 2012年12月31日

小麦非特异转脂蛋白TaLTP1启动子区域顺式作用元件鉴定及转录调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于心理场效应的驾驶行为模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于转染细胞模型研究NER系统基因多态与环境致癌因子的交互作用机制

国家自然科学基金

0+阅读 · 2009年12月31日

Improvement of selection formulas of mesh size and truncation numbers for the DE-Sinc approximation and its theoretical error bound

Arxiv

0+阅读 · 2023年5月8日

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Arxiv

0+阅读 · 2023年5月7日

RECIPE: Rateless Erasure Codes Induced by Protocol-Based Encoding

Arxiv

0+阅读 · 2023年5月5日

Simulating H.P. Lovecraft horror literature with the ChatGPT large language model

Arxiv

0+阅读 · 2023年5月5日

Hierarchical Transformer for Scalable Graph Learning

Arxiv

0+阅读 · 2023年5月5日

A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Arxiv

0+阅读 · 2023年5月3日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Arxiv

20+阅读 · 2021年12月27日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Improvement of selection formulas of mesh size and truncation numbers for the DE-Sinc approximation and its theoretical error bound

Arxiv

0+阅读 · 2023年5月8日

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Arxiv

0+阅读 · 2023年5月7日

RECIPE: Rateless Erasure Codes Induced by Protocol-Based Encoding

Arxiv

0+阅读 · 2023年5月5日

Simulating H.P. Lovecraft horror literature with the ChatGPT large language model

Arxiv

0+阅读 · 2023年5月5日

Hierarchical Transformer for Scalable Graph Learning

Arxiv

0+阅读 · 2023年5月5日

A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Arxiv

0+阅读 · 2023年5月3日

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Arxiv

20+阅读 · 2021年12月27日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

相关基金

DPH1基因新可变剪接体DPH1-IR在鼻咽癌中致癌功能及机制

国家自然科学基金

0+阅读 · 2014年12月31日

常染色体隐性遗传小脑性共济失调新的致病基因CAX的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于张量模型的交通数据重建方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

云杉属树种水分运输关键功能性状适应性进化研究

国家自然科学基金

0+阅读 · 2012年12月31日

小麦非特异转脂蛋白TaLTP1启动子区域顺式作用元件鉴定及转录调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于心理场效应的驾驶行为模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于转染细胞模型研究NER系统基因多态与环境致癌因子的交互作用机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员