ConvBERT: 利用基于泛泛星的动态革命改善BERT (ConvBERT: Improving BERT with Span-based Dynamic Convolution) - 专知论文

会员服务 ·

0

ConvBERT · BERT · MoDELS · 注意力机制 · 卷积 ·

2020 年 11 月 12 日

ConvBERT: Improving BERT with Span-based Dynamic Convolution

翻译：ConvBERT: 利用基于泛泛星的动态革命改善BERT

Zihang Jiang,Weihao Yu,Daquan Zhou,Yunpeng Chen,Jiashi Feng,Shuicheng Yan

from arxiv, 17 pages

Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while using less than 1/4 training cost. Code and pre-trained models will be released.

翻译：在各种自然语言理解任务中,如BERT及其变体等受过训练的语言模型最近取得了令人印象深刻的成绩。然而,BERT严重依赖全球自留区块,因此产生了巨大的记忆足迹和计算成本。尽管它的所有关注点头都询问整个输入序列,以便从全球角度生成关注地图,但我们观察到一些负责人只需要学习当地依赖性,这意味着存在计算冗余。因此,我们提议以新的跨基动态演进来取代这些自留区头直接模拟本地依赖性。新的革命头目与休息自留区头一起形成了一个新的混合关注块,在全球和地方背景下学习方面都更为有效。我们给BERT配备了这种混合关注设计,并建立了ConvBERT模型。实验表明,CONBERT在各种下游任务中明显地超越了BERT及其变体,培训成本较低,模型也较少。值得注意的是,ConBERTBase模型取得了86.4 GLUE分,比ELTRABase值高出0.7,同时使用不到四分之一的培训成本。

0

相关内容

ConvBERT

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

专知会员服务

77+阅读 · 2020年2月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

专知

5+阅读 · 2020年10月17日

基于知识蒸馏的BERT模型压缩

基于知识蒸馏的BERT模型压缩

大数据文摘

18+阅读 · 2019年10月14日

【论文笔记】基于BERT的知识图谱补全

【论文笔记】基于BERT的知识图谱补全

专知

116+阅读 · 2019年9月15日

20项任务全面碾压BERT，全新XLNet预训练模型

20项任务全面碾压BERT，全新XLNet预训练模型

机器学习算法与Python学习

15+阅读 · 2019年6月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

中文版-BERT-预训练的深度双向Transformer语言模型-详细介绍

中文版-BERT-预训练的深度双向Transformer语言模型-详细介绍

深度学习与NLP

30+阅读 · 2019年3月30日

DeepLab V3

计算机视觉战队

9+阅读 · 2018年4月2日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Convolutional Neural Networks with Dynamic Regularization

Arxiv

0+阅读 · 2020年12月31日

Improving BERT with Syntax-aware Local Attention

Arxiv

0+阅读 · 2020年12月30日

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Arxiv

8+阅读 · 2020年10月9日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

An Attention-Gated Convolutional Neural Network for Sentence Classification

An Attention-Gated Convolutional Neural Network for Sentence Classification

Arxiv

4+阅读 · 2018年12月28日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

专知会员服务

77+阅读 · 2020年2月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同作战规划：来自美海军陆战队的大语言模型（LLM）使用教训

对北约军事总部战略规划制定与实施的研究 | 140页

美联参会指南-联合规划与执行概述及政策框架 | 32页

俄罗斯军事规划差异性凸显其思维的重要性 | 2025最新文献

相关资讯

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

专知

5+阅读 · 2020年10月17日

基于知识蒸馏的BERT模型压缩

基于知识蒸馏的BERT模型压缩

大数据文摘

18+阅读 · 2019年10月14日

【论文笔记】基于BERT的知识图谱补全

【论文笔记】基于BERT的知识图谱补全

专知

116+阅读 · 2019年9月15日

20项任务全面碾压BERT，全新XLNet预训练模型

20项任务全面碾压BERT，全新XLNet预训练模型

机器学习算法与Python学习

15+阅读 · 2019年6月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

中文版-BERT-预训练的深度双向Transformer语言模型-详细介绍

中文版-BERT-预训练的深度双向Transformer语言模型-详细介绍

深度学习与NLP

30+阅读 · 2019年3月30日

DeepLab V3

计算机视觉战队

9+阅读 · 2018年4月2日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Convolutional Neural Networks with Dynamic Regularization

Arxiv

0+阅读 · 2020年12月31日

Improving BERT with Syntax-aware Local Attention

Arxiv

0+阅读 · 2020年12月30日

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Arxiv

8+阅读 · 2020年10月9日

Rethinking Positional Encoding in Language Pre-training

Arxiv

4+阅读 · 2020年7月9日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

An Attention-Gated Convolutional Neural Network for Sentence Classification

An Attention-Gated Convolutional Neural Network for Sentence Classification

Arxiv

4+阅读 · 2018年12月28日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员