元1.0:零热和少热学习的大型预培训语言模式 (Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning) - 专知论文

会员服务 ·

0

语言模型化 · Performer · 小样本学习 · MoDELS · GPT-3 ·

2021 年 10 月 10 日

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

翻译：元1.0:零热和少热学习的大型预培训语言模式

Shaohua Wu,Xudong Zhao,Tong Yu,Rongguo Zhang,Chong Shen,Hongli Liu,Feng Li,Hong Zhu,Jiangang Luo,Liang Xu,Xuanwei Zhang,Jun Liu

Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks by scaling up model size, dataset size and the amount of computation. However, training a model like GPT-3 requires huge amount of computational resources which makes it challengeable to researchers. In this work, we propose a method that incorporates large-scale distributed training performance into model architecture design. With this method, Yuan 1.0, the current largest singleton language model with 245B parameters, achieves excellent performance on thousands GPUs during training, and the state-of-the-art results on NLP tasks. A data processing method is designed to efficiently filter massive amount of raw data. The current largest high-quality Chinese corpus with 5TB high quality texts is built based on this method. In addition, a calibration and label expansion method is proposed to improve the Zero-Shot and Few-Shot performance, and steady improvement is observed on the accuracy of various tasks. Yuan 1.0 presents strong capacity of natural language generation, and the generated articles are difficult to distinguish from the human-written ones.

翻译：GPT-3等近期工作通过扩大模型规模、数据集大小和计算量,展示了许多自然语言处理(NLP)任务中的零热和少热学习的出色表现。然而,GPT-3等模型的培训需要大量的计算资源,使研究人员能够对此提出质疑。在这项工作中,我们提出了一种方法,将大规模分布式培训绩效纳入模型结构设计中。用这种方法,目前最大的单吨语言模式,有245B参数的元1.0,在培训期间对数千个通用语言进行了出色的表现,并取得了NLP任务的最新成果。设计了一个数据处理方法,以便有效地过滤大量原始数据。目前最大的高质量中国高品质的5TB高质量文本是建立在这一方法基础上的。此外,还提出了一种校准和标签扩展方法,以改进零热和少热的性能,并观察到各种任务的准确性能稳步改进。 ⁇ 1.0展示了天然语言生成的强大能力,所产生的文章很难与人写成的文字区分。

0

相关内容

语言模型化

语言模型化

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

专知会员服务

210+阅读 · 2020年4月13日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

专知会员服务

48+阅读 · 2019年6月10日

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

云栖社区

22+阅读 · 2019年4月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Arxiv

6+阅读 · 2021年5月26日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

OntoZSL: Ontology-enhanced Zero-shot Learning

Arxiv

17+阅读 · 2021年2月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

Meta-Transfer Learning for Few-Shot Learning

Meta-Transfer Learning for Few-Shot Learning

Arxiv

4+阅读 · 2019年4月9日

VIP会员

文章信息

相关主题

语言模型化

小样本学习

相关VIP内容

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

【香港科技大学】最新《小样本学习(Few-shot learning)》2020综述论文大全，34页pdf166篇参考文献

专知会员服务

210+阅读 · 2020年4月13日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

专知会员服务

48+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

云栖社区

22+阅读 · 2019年4月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Arxiv

6+阅读 · 2021年5月26日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

OntoZSL: Ontology-enhanced Zero-shot Learning

Arxiv

17+阅读 · 2021年2月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

Meta-Transfer Learning for Few-Shot Learning

Meta-Transfer Learning for Few-Shot Learning

Arxiv

4+阅读 · 2019年4月9日

微信扫码咨询专知VIP会员