这不仅仅是大小问题: 小型语言模型也是很少的热量学习者。 (It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners) - 专知论文

会员服务 ·

0

语言模型化 · 小样本学习 · MoDELS · 学习器 · GPT-3 ·

2021 年 4 月 12 日

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

翻译：这不仅仅是大小问题: 小型语言模型也是很少的热量学习者。

Timo Schick,Hinrich Schütze

from arxiv, Accepted at NAACL2021

When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much "greener" in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models.

翻译：当测量到数千亿个参数时,诸如GPT-3(Brown等人,2020年)等预先培训的语言模型取得了显著的微小性能,然而,培训和应用这些大型模型需要大量计算,从而导致大量的碳足迹,研究人员和从业人员难以使用这些模型。我们证明,与GPT-3类似的语言模型可以取得类似于GPT-3的性能,因为这些语言模型的参数数小于几个数量级。这是通过将文字输入转换成含有任务描述的混凝土问题,加上基于梯度的优化;利用未贴标签的数据可以带来进一步的改进。我们确定了与小语言模型成功自然语言理解所需的关键因素。

2

相关内容

语言模型化

语言模型化

【CHI2021】可解释人工智能导论

【CHI2021】可解释人工智能导论

专知会员服务

121+阅读 · 2021年5月25日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

139+阅读 · 2020年7月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

专知会员服务

51+阅读 · 2019年11月26日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

已删除

将门创投

3+阅读 · 2019年10月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Training Multilingual Pre-trained Language Model with Byte-level Subwords

Training Multilingual Pre-trained Language Model with Byte-level Subwords

Arxiv

0+阅读 · 2021年6月3日

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Arxiv

4+阅读 · 2021年2月22日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Compression of Deep Learning Models for Text: A Survey

Compression of Deep Learning Models for Text: A Survey

Arxiv

7+阅读 · 2020年8月12日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

VIP会员

文章信息

相关主题

语言模型化

小样本学习

相关VIP内容

【CHI2021】可解释人工智能导论

【CHI2021】可解释人工智能导论

专知会员服务

121+阅读 · 2021年5月25日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

139+阅读 · 2020年7月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

【文章|BERT三步使用NLP迁移学习】NLP Transfer Learning In 3 Steps

专知会员服务

51+阅读 · 2019年11月26日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

已删除

将门创投

3+阅读 · 2019年10月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Training Multilingual Pre-trained Language Model with Byte-level Subwords

Training Multilingual Pre-trained Language Model with Byte-level Subwords

Arxiv

0+阅读 · 2021年6月3日

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Arxiv

4+阅读 · 2021年2月22日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Compression of Deep Learning Models for Text: A Survey

Compression of Deep Learning Models for Text: A Survey

Arxiv

7+阅读 · 2020年8月12日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

微信扫码咨询专知VIP会员