教学-精美语言模式 (Scaling Instruction-Finetuned Language Models) - 专知论文

会员服务 ·

0

Performer · MoDELS · 语言模型化 · 缩放 · Palm ·

2022 年 11 月 16 日

Scaling Instruction-Finetuned Language Models

翻译：教学-精美语言模式

Hyung Won Chung,Le Hou,Shayne Longpre,Barret Zoph,Yi Tay,William Fedus,Yunxuan Li,Xuezhi Wang,Mostafa Dehghani,Siddhartha Brahma,Albert Webson,Shixiang Shane Gu,Zhuyun Dai,Mirac Suzgun,Xinyun Chen,Aakanksha Chowdhery,Sharan Narang,Gaurav Mishra,Adams Yu,Vincent Zhao,Yanping Huang,Andrew Dai,Hongkun Yu,Slav Petrov,Ed H. Chi,Jeff Dean,Jacob Devlin,Adam Roberts,Denny Zhou,Quoc V. Le,Jason Wei

from arxiv, Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

翻译：在本文中,我们探讨对教学进行微调,特别侧重于:(1) 扩大任务数量,(2) 扩大模型规模,(3) 对思维链数据进行微调;我们发现,根据上述方面对教学进行微调,大大改进了各种模型类(PALM、T5、U-PALM)、促进设置(零发、几发、COT)和评价基准(MMLU、BBH、TyDiQA、MGSM、开放一代人)的绩效(MMMLU、BBH、TyDiQA、MGSM、开放一代人)的绩效。例如,Flan-PALM 540B 指令比PALM 540B大幅度(平均+9.4%)的绩效;Flan-PALM 540B 在几个基准上取得了最先进的绩效,例如:75.2%的MMMLU。我们还公开公布Flan-T5检查站,这些检查站在18K任务中取得了强劲的微分数分数级业绩,甚至改进了整个模型。

0

相关内容

Performer

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

Mir124介导柴胡疏肝散调控抑郁症肝郁证模型海马神经可塑性的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

同轴多孔FeCo基磁性纤维的可控制备与微波损耗机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于深层学习的汉语句法语义分析研究

国家自然科学基金

3+阅读 · 2012年12月31日

机载InSAR区域网平差方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

陶瓷材料热震失效机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

驴Cathelicidin EA-CATH1的结构与功能研究及分子设计

国家自然科学基金

0+阅读 · 2009年12月31日

基于DEM的黄土高原流域边界剖面谱研究

国家自然科学基金

0+阅读 · 2009年12月31日

非晶稀土氧化物高k栅介质材料的制备及物理特性研究

国家自然科学基金

0+阅读 · 2008年12月31日

纳米晶磁体分层流变对织构发展和热变形能力作用机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Effective Decision Boundary Learning for Class Incremental Learning

Arxiv

0+阅读 · 2023年1月12日

Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks

Arxiv

0+阅读 · 2023年1月12日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

相关论文

Effective Decision Boundary Learning for Class Incremental Learning

Arxiv

0+阅读 · 2023年1月12日

Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks

Arxiv

0+阅读 · 2023年1月12日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

Mir124介导柴胡疏肝散调控抑郁症肝郁证模型海马神经可塑性的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

同轴多孔FeCo基磁性纤维的可控制备与微波损耗机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于深层学习的汉语句法语义分析研究

国家自然科学基金

3+阅读 · 2012年12月31日

机载InSAR区域网平差方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

陶瓷材料热震失效机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

驴Cathelicidin EA-CATH1的结构与功能研究及分子设计

国家自然科学基金

0+阅读 · 2009年12月31日

基于DEM的黄土高原流域边界剖面谱研究

国家自然科学基金

0+阅读 · 2009年12月31日

非晶稀土氧化物高k栅介质材料的制备及物理特性研究

国家自然科学基金

0+阅读 · 2008年12月31日

纳米晶磁体分层流变对织构发展和热变形能力作用机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员