【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth - 专知

会员服务 ·

0

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

2020 年 4 月 13 日 专知

像BERT和RoBERTa这样的预训练语言模型，尽管在许多自然语言处理任务中功能强大，但在计算和内存方面都很昂贵。为了缓解这个问题，一种方法是在部署之前对特定任务进行压缩。然而，最近的BERT压缩工作通常将大的BERT模型压缩到一个固定的更小的尺寸，并不能完全满足不同硬件性能的不同边缘器件的要求。在本文中，我们提出了一种新的动态BERT模型(简称DynaBERT)，它可以在自适应的宽度和深度上运行。DynaBERT的训练过程包括首先训练一个宽度自适应的BERT，然后通过从全尺寸的模型中提取知识到小的子网络中，允许自适应的宽度和深度。网络重布线也被用来让更多的子网络共享更重要的注意力头部和神经元。在各种效率约束下的综合实验表明，我们提出的动态BERT(或RoBERTa)在其最大尺寸下的性能与BERT(或RoBERTa)相当，而在较小的宽度和深度下，动态BERT(或RoBERTa)的性能始终优于现有的BERT压缩方法。

https://arxiv.org/abs/2004.04037

专知便捷查看

便捷下载，请关注专知公众号（点击上方蓝色专知关注）

后台回复“DBERT” 就可以获取《【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth》专知下载链接

专知，专业可信的人工智能知识分发，让认知协作更快更好！欢迎注册登录专知www.zhuanzhi.ai，获取5000+AI主题干货知识资料！

欢迎微信扫一扫加入专知人工智能知识星球群，获取最新AI专业干货知识教程资料和与专家交流咨询！

点击“ 阅读原文 ”，了解使用专知 ，查看获取5000+AI主题知识资源

登录查看更多

0

相关内容

BERT

BERT全称Bidirectional Encoder Representations from Transformers，是预训练语言表示的方法，可以在大型文本语料库（如维基百科）上训练通用的“语言理解”模型，然后将该模型用于下游NLP任务，比如机器翻译、问答。

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

专知会员服务

25+阅读 · 2020年3月30日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

专知会员服务

22+阅读 · 2020年3月17日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【清华大学】利用知识增强的图神经网络进行多段推理，Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

【清华大学】利用知识增强的图神经网络进行多段推理，Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

专知会员服务

95+阅读 · 2019年11月8日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知

48+阅读 · 2020年3月30日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知

10+阅读 · 2020年3月28日

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

专知

22+阅读 · 2020年3月14日

BERT技术体系综述论文：40项分析探究BERT如何work

BERT技术体系综述论文：40项分析探究BERT如何work

专知

50+阅读 · 2020年3月1日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知

12+阅读 · 2020年2月12日

最新！Yann Lecun 纽约大学Spring2020深度学习课程，附66页PPT下载

最新！Yann Lecun 纽约大学Spring2020深度学习课程，附66页PPT下载

专知

16+阅读 · 2020年1月28日

【资源】最新BERT相关论文清单汇总

【资源】最新BERT相关论文清单汇总

专知

33+阅读 · 2019年10月2日

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

专知

36+阅读 · 2019年9月29日

BERT一作Jacob Devlin斯坦福演讲PPT：BERT介绍与答疑

BERT一作Jacob Devlin斯坦福演讲PPT：BERT介绍与答疑

专知

49+阅读 · 2019年3月7日

【SIGGraphAsia2018-虚拟现实与增强现实的前沿技术PPT】

【SIGGraphAsia2018-虚拟现实与增强现实的前沿技术PPT】

专知

7+阅读 · 2018年12月8日

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Arxiv

6+阅读 · 2020年3月10日

Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Arxiv

7+阅读 · 2019年9月17日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

Neural Ordinary Differential Equations

Arxiv

6+阅读 · 2018年10月3日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Arxiv

3+阅读 · 2018年8月7日

Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks

Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks

Arxiv

7+阅读 · 2018年7月20日

Dynamic Weight Alignment for Convolutional Neural Networks

Arxiv

6+阅读 · 2018年1月25日

Depth-Gated LSTM

Arxiv

4+阅读 · 2015年8月25日

VIP会员

相关主题

预训练语言模型

相关VIP内容

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

【LITIS Lab】衔接图卷积神经网络谱域和空间域，Spectral and Spatial Domains in GNN

专知会员服务

25+阅读 · 2020年3月30日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

【CVPR2020-清华大学】分辨率自适应网络的有效推理，Resolution Adaptive Networks

专知会员服务

22+阅读 · 2020年3月17日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【清华大学】利用知识增强的图神经网络进行多段推理，Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

【清华大学】利用知识增强的图神经网络进行多段推理，Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

专知会员服务

95+阅读 · 2019年11月8日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知

48+阅读 · 2020年3月30日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知

10+阅读 · 2020年3月28日

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

专知

22+阅读 · 2020年3月14日

BERT技术体系综述论文：40项分析探究BERT如何work

BERT技术体系综述论文：40项分析探究BERT如何work

专知

50+阅读 · 2020年3月1日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知

12+阅读 · 2020年2月12日

最新！Yann Lecun 纽约大学Spring2020深度学习课程，附66页PPT下载

最新！Yann Lecun 纽约大学Spring2020深度学习课程，附66页PPT下载

专知

16+阅读 · 2020年1月28日

【资源】最新BERT相关论文清单汇总

【资源】最新BERT相关论文清单汇总

专知

33+阅读 · 2019年10月2日

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

专知

36+阅读 · 2019年9月29日

BERT一作Jacob Devlin斯坦福演讲PPT：BERT介绍与答疑

BERT一作Jacob Devlin斯坦福演讲PPT：BERT介绍与答疑

专知

49+阅读 · 2019年3月7日

【SIGGraphAsia2018-虚拟现实与增强现实的前沿技术PPT】

【SIGGraphAsia2018-虚拟现实与增强现实的前沿技术PPT】

专知

7+阅读 · 2018年12月8日

相关论文

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Arxiv

6+阅读 · 2020年3月10日

Span-based Joint Entity and Relation Extraction with Transformer Pre-training

Arxiv

7+阅读 · 2019年9月17日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

Neural Ordinary Differential Equations

Arxiv

6+阅读 · 2018年10月3日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Arxiv

3+阅读 · 2018年8月7日

Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks

Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks

Arxiv

7+阅读 · 2018年7月20日

Dynamic Weight Alignment for Convolutional Neural Networks

Arxiv

6+阅读 · 2018年1月25日

Depth-Gated LSTM

Arxiv

4+阅读 · 2015年8月25日

大家都在搜

CMU博士论文

无人机集群

国防科技创新

软件无线电

无人机航拍交通事故现场勘查处置系统——行业第一的警用事故处理软件

微信扫码咨询专知VIP会员