以遮罩替代未受训练语言模式的微调 (Masking as an Efficient Alternative to Finetuning for Pretrained Language Models) - 专知论文

会员服务 ·

0

掩码 · 语言模型化 · MoDELS · Extensibility · 掩码语言模型化 ·

2020 年 10 月 11 日

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

翻译：以遮罩替代未受训练语言模式的微调

Mengjie Zhao,Tao Lin,Fei Mi,Martin Jaggi,Hinrich Schütze

from arxiv, EMNLP 2020; MZ and TL contribute equally

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

翻译：我们提出了一种使用预先培训语言模式的有效方法,我们通过这种方法学习了有选择的二元面罩,以取代通过微调对未经培训的重量进行修改。对隐蔽 BERT 和 RoBERTA 的一系列NLP 任务进行的广泛评估表明,我们的掩罩计划能产生与微调相当的性能,但在需要同时推导几项任务时,其记忆足迹要小得多。通过内在评估,我们表明,由隐蔽语言模式计算出来的表达方式将解决下游任务所必需的信息编码起来。分析损失情况,我们显示,遮罩和微调可以产生由直线段连接的微型模型,且几乎始终测试准确性。这证实,遮罩可以用作微调的有效替代方法。

0

相关内容

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

专知会员服务

49+阅读 · 2020年10月17日

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

专知会员服务

24+阅读 · 2020年9月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【CVPR2020-普林斯顿】自监督预训练对于视觉任务到底有什么用？ Self-Supervised Pretraining

【CVPR2020-普林斯顿】自监督预训练对于视觉任务到底有什么用？ Self-Supervised Pretraining

专知会员服务

24+阅读 · 2020年4月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Rethinking Generalization in American Sign Language Prediction for Edge Devices with Extremely Low Memory Footprint

Arxiv

0+阅读 · 2020年11月27日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Fine-tune BERT for Extractive Summarization

Arxiv

3+阅读 · 2019年9月5日

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

Arxiv

3+阅读 · 2019年8月23日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

VIP会员

文章信息

相关主题

语言模型化

掩码语言模型化

相关VIP内容

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

NLPCC 2020《预训练语言模型回顾》讲义下载，156页PPT

专知会员服务

49+阅读 · 2020年10月17日

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

专知会员服务

24+阅读 · 2020年9月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【CVPR2020-普林斯顿】自监督预训练对于视觉任务到底有什么用？ Self-Supervised Pretraining

【CVPR2020-普林斯顿】自监督预训练对于视觉任务到底有什么用？ Self-Supervised Pretraining

专知会员服务

24+阅读 · 2020年4月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Rethinking Generalization in American Sign Language Prediction for Edge Devices with Extremely Low Memory Footprint

Arxiv

0+阅读 · 2020年11月27日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Fine-tune BERT for Extractive Summarization

Arxiv

3+阅读 · 2019年9月5日

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

Arxiv

3+阅读 · 2019年8月23日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

微信扫码咨询专知VIP会员