通过任务不可知面罩培训学习在 BERT 传输中赢彩票 (Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training) - 专知论文

会员服务 ·

0

BERT · 剪枝 · Performer · 掩码 · 学成 ·

2022 年 4 月 24 日

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

翻译：通过任务不可知面罩培训学习在 BERT 传输中赢彩票

Yuanxin Liu,Fandong Meng,Zheng Lin,Peng Fu,Yanan Cao,Weiping Wang,Jie Zhou

from arxiv, Accepted by NAACL 2022

Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.

翻译：关于彩票假设(LTH)的最近研究显示,像BERT这样的预先培训语言模型(PLM)含有与原PLM相似的转移学习性能的匹配子网络。这些子网络使用基于规模的裁剪。在本文中,我们发现BERT子网络比这些研究所显示的更具有更大的潜力。首先,我们发现,规模调整的成功可归因于与下游可转移性相关的保留的培训前业绩。受此启发,我们建议直接优化子网络结构,使其与培训前目标相匹配,从而更好地维护培训前的绩效。具体地说,我们在培训前的任务中,对模型重量进行双面面罩的培训,目的是维护该子网络的普遍可转移性。对于任何具体的下游任务,我们发现这种可忽略的可能性更大。然后我们微调了GLUE基准和SQUAD数据集上的子网络。结果显示,与规模调整相比,遮盖培训可以有效地发现BERT子网络在下游任务上的总体性能得到改善。此外,我们在下游任务中搜索某些高利域网时,我们的方法也是在GMT/MTA范围内进行更高效的搜索。

0

相关内容

BERT

BERT全称Bidirectional Encoder Representations from Transformers，是预训练语言表示的方法，可以在大型文本语料库（如维基百科）上训练通用的“语言理解”模型，然后将该模型用于下游NLP任务，比如机器翻译、问答。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

NAD+/CD38/cADPR信号通路介导脓毒症脑病的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

可电离内分泌干扰物与运甲状腺素蛋白相互作用的计算模拟与验证

国家自然科学基金

0+阅读 · 2015年12月31日

以FGF8为靶点的前列腺癌分子显像与治疗研究

国家自然科学基金

0+阅读 · 2014年12月31日

VSIG4抑制MHV-3病毒感染诱发的暴发型肝衰竭的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

肾小球内皮细胞表达的swiprosin-1在早期糖尿病肾病发病中的作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

MCT-1 作为弥漫性大B细胞淋巴瘤治疗靶点及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Prohibitin调控癌组织内源性雄激素合成促进前列腺癌激素抵抗性进展机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

科罗索酸调控TRPM8抑制肝癌细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Masked Autoencoders are Robust Data Augmentors

Arxiv

0+阅读 · 2022年6月10日

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

Arxiv

0+阅读 · 2022年6月9日

Extreme Masking for Learning Instance and Distributed Visual Representations

Extreme Masking for Learning Instance and Distributed Visual Representations

Arxiv

0+阅读 · 2022年6月9日

Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks

Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks

Arxiv

0+阅读 · 2022年6月9日

HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign Supermask

HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign Supermask

Arxiv

0+阅读 · 2022年6月9日

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Arxiv

1+阅读 · 2022年6月8日

DynaMaR: Dynamic Prompt with Mask Token Representation

Arxiv

0+阅读 · 2022年6月7日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美军小型无人机项目

无人机蜂群——作为执行非常规战争的创新工具 | 2025最新文献

不确定环境下无人机与无人地面车辆编队的地下勘探规划算法 | 122页

接纳无人机多样性：西方军事在无人机战争中适应的五个挑战 | 28页报告

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Masked Autoencoders are Robust Data Augmentors

Arxiv

0+阅读 · 2022年6月10日

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

Arxiv

0+阅读 · 2022年6月9日

Extreme Masking for Learning Instance and Distributed Visual Representations

Extreme Masking for Learning Instance and Distributed Visual Representations

Arxiv

0+阅读 · 2022年6月9日

Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks

Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks

Arxiv

0+阅读 · 2022年6月9日

HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign Supermask

HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign Supermask

Arxiv

0+阅读 · 2022年6月9日

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Arxiv

1+阅读 · 2022年6月8日

DynaMaR: Dynamic Prompt with Mask Token Representation

Arxiv

0+阅读 · 2022年6月7日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

NAD+/CD38/cADPR信号通路介导脓毒症脑病的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

可电离内分泌干扰物与运甲状腺素蛋白相互作用的计算模拟与验证

国家自然科学基金

0+阅读 · 2015年12月31日

以FGF8为靶点的前列腺癌分子显像与治疗研究

国家自然科学基金

0+阅读 · 2014年12月31日

VSIG4抑制MHV-3病毒感染诱发的暴发型肝衰竭的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

肾小球内皮细胞表达的swiprosin-1在早期糖尿病肾病发病中的作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

MCT-1 作为弥漫性大B细胞淋巴瘤治疗靶点及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Prohibitin调控癌组织内源性雄激素合成促进前列腺癌激素抵抗性进展机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

科罗索酸调控TRPM8抑制肝癌细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员