上调:提高小型语文模式的零光学习能力 (Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Learning · 缩放 · 掩码语言模型化 ·

2022 年 12 月 20 日

Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models

翻译：上调:提高小型语文模式的零光学习能力

Jingjing Xu,Qingxiu Dong,Hongyi Liu,Lei Li

With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.

翻译：随着规模的扩大,大型语言模型显示出数量上的改进和新的质量能力,特别是像GPT-3这样的零点学习者。然而,这些结果严重依赖微妙的迅速设计和大规模计算。在这项工作中,我们探讨强力零点能力能否在没有外部监督数据的情况下在较小的模型规模上实现。为实现这一目标,我们重新研究隐形语言模型,并推出一种以几何制导的自我监督学习方法(短调),方法是利用少量任务自视数据来进一步更新语言模型。实验显示,Go调能够使T5-小型(80M)有竞争力的零点结果与大型语言模型(如T5-XL(3B))相比得以实现。我们还在多任务设置上应用Go-T5(250M)模型,并开发一个多任务模型(mg-T5(250M))。它可以在9个数据集上达到巴勒斯坦被占领土的平均性能(175B) 。

0

相关内容

语言模型化

语言模型化

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

加氢TiO2纳米线阵列的制备及其光解水制氢性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

超支化有机硅离子液体增韧苯并噁嗪树脂的结构调控与性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

糖尿病认知功能障碍患者海马体积、生化改变的磁共振成像及波谱分析

国家自然科学基金

0+阅读 · 2011年12月31日

20(S)-原人参二醇靶向抑制PI3K/Akt信号途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

刚性和柔性配体协同调控的钯(II)、铂(II)配合物的分子设计、结构及细胞毒性

国家自然科学基金

0+阅读 · 2011年12月31日

具有疏水微环境的阴离子共轭聚合物薄膜荧光传感器

国家自然科学基金

0+阅读 · 2011年12月31日

OX40信号调控在供者源性Treg诱导移植耐受中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

无机/有机非线性光学晶态材料的设计与可控制备研究

国家自然科学基金

0+阅读 · 2009年12月31日

The Capacity for Moral Self-Correction in Large Language Models

Arxiv

1+阅读 · 2023年2月18日

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Arxiv

0+阅读 · 2023年2月18日

Scalable Prompt Generation for Semi-supervised Learning with Language Models

Arxiv

0+阅读 · 2023年2月18日

Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Arxiv

0+阅读 · 2023年2月17日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年2月16日

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Arxiv

1+阅读 · 2023年2月16日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

VIP会员

文章信息

相关主题

语言模型化

掩码语言模型化

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据智能体综述：新兴范式还是被高估的炒作？

海底战已至：美国构思海底安全战略 | 最新报告

【ICCV2025教程】视觉异常检测中的基础模型：进展、挑战与应用

美军将无人自主等新技术融入潜艇部队以更具杀伤力

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

The Capacity for Moral Self-Correction in Large Language Models

Arxiv

1+阅读 · 2023年2月18日

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Arxiv

0+阅读 · 2023年2月18日

Scalable Prompt Generation for Semi-supervised Learning with Language Models

Arxiv

0+阅读 · 2023年2月18日

Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Arxiv

0+阅读 · 2023年2月17日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年2月16日

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Arxiv

1+阅读 · 2023年2月16日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

相关基金

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

加氢TiO2纳米线阵列的制备及其光解水制氢性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

超支化有机硅离子液体增韧苯并噁嗪树脂的结构调控与性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

糖尿病认知功能障碍患者海马体积、生化改变的磁共振成像及波谱分析

国家自然科学基金

0+阅读 · 2011年12月31日

20(S)-原人参二醇靶向抑制PI3K/Akt信号途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

刚性和柔性配体协同调控的钯(II)、铂(II)配合物的分子设计、结构及细胞毒性

国家自然科学基金

0+阅读 · 2011年12月31日

具有疏水微环境的阴离子共轭聚合物薄膜荧光传感器

国家自然科学基金

0+阅读 · 2011年12月31日

OX40信号调控在供者源性Treg诱导移植耐受中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

无机/有机非线性光学晶态材料的设计与可控制备研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员