Doge 入场券: 通过播放彩票来隐藏域通用语言模型 (Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets) - 专知论文

会员服务 ·

0

语言模型化 · Learning · MoDELS · motivation · 方差 ·

2022 年 9 月 19 日

Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets

翻译：Doge 入场券: 通过播放彩票来隐藏域通用语言模型

Yi Yang,Chen Zhang,Benyou Wang,Dawei Song

from arxiv, Accepted to NLPCC 2022. Code is available at https://github.com/Ylily1015/DogeTickets

Over-parameterized models, typically pretrained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.

翻译：由于学习偏差小,超参数模型(通常为预先训练的语言模型)显示出了极具吸引力的表情力量,然而,LM模型的庞大学习能力也会导致巨大的学习差异。在一项试点研究中,我们发现,在面临多个领域时,参数的关键部分以特定领域的方式出乎意料地出现,而另一些参数则以一般领域的形式出现。受这一现象的驱使,我们首次认为,域性一般参数可以支持原LM产生的域性一般LM。为了发现域性通用LM,我们提议通过玩彩票(双面彩票)来确定域性一般参数。为了干预彩票,我们提议了一个域性总分,说明域性变差是如何与差异相联系的。在亚马孙、Mnli和Onto Notes数据集上进行了全面实验。结果显示, doge机票与一系列竞争性基线相比,在外面性一般化方面得到了改进。分析结果进一步暗示了域性票参数的存在和性能的一致性。

0

相关内容

语言模型化

语言模型化

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

云仿真中的计算资源分配方法研究

国家自然科学基金

3+阅读 · 2013年12月31日

CHOP 调控ERO1α在急性肝损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

ADS中检测快中子束的GEM探测器的研制

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

CAP70经调节PTEN磷酸化影响肾癌恶性表型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNA-MEG3对猪骨骼肌细胞增殖的作用及调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNAs在非小细胞肺癌EGFR-TKIs耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Full-band General Audio Synthesis with Score-based Diffusion

Arxiv

0+阅读 · 2022年10月26日

Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Arxiv

0+阅读 · 2022年10月25日

Parameter-Efficient Legal Domain Adaptation

Arxiv

0+阅读 · 2022年10月25日

Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions

Arxiv

21+阅读 · 2022年9月21日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Arxiv

15+阅读 · 2020年3月26日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型时代的文档智能：综述

蜂窝通信是否是无人机与无人地面战车主宰战场的关键？

文档视觉问答简述

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Full-band General Audio Synthesis with Score-based Diffusion

Arxiv

0+阅读 · 2022年10月26日

Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Arxiv

0+阅读 · 2022年10月25日

Parameter-Efficient Legal Domain Adaptation

Arxiv

0+阅读 · 2022年10月25日

Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions

Arxiv

21+阅读 · 2022年9月21日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Arxiv

15+阅读 · 2020年3月26日

相关基金

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

云仿真中的计算资源分配方法研究

国家自然科学基金

3+阅读 · 2013年12月31日

CHOP 调控ERO1α在急性肝损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

ADS中检测快中子束的GEM探测器的研制

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

CAP70经调节PTEN磷酸化影响肾癌恶性表型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNA-MEG3对猪骨骼肌细胞增殖的作用及调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNAs在非小细胞肺癌EGFR-TKIs耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员