双向双向语言模型也是少见的学习者 (Bidirectional Language Models Are Also Few-shot Learners) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 双向语言模型 · Prompt · Learning ·

2023 年 2 月 6 日

Bidirectional Language Models Are Also Few-shot Learners

翻译：双向双向语言模型也是少见的学习者

Ajay Patel,Bryan Li,Mohammad Sadegh Rasooli,Noah Constant,Colin Raffel,Chris Callison-Burch

from arxiv, To appear at ICLR 2023

Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.

翻译：GPT-3(Brown等人,2020年)等大型语言模型可以执行任意性任务,而不必在只用几个标签的例子来推动后进行微调。可以将任意性任务重新拟订为自然语言提示,可以要求语言模型完成,间接地在被称为快速学习的范式中执行这项任务。到目前为止,以单一方向语言模型为主的新兴快速速成学习能力已经展示出来。然而,在对隐蔽语言模型等目标解密前培训的双向语言模型中,双向MT5模型为转移学习提供了更强的学习介绍。这促使了双向的双向语言模型,激发了双向双向的双向模式,为转移学习提供了更强的学习演示。这促使双向双向双向的双向模型,如隐蔽语言模型(Xue et al., 2021) 的直径直流和直径直方向模型基本上与现有的加速模式不相容不相容。我们用直径直径直径直径直径直的X5和直径直径直径直径直径20G的SAPML的模型(我们直径直径2021)和直径20G)的直方向模型的直翻译。

0

相关内容

语言模型化

语言模型化

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

以ED-A(+)Fn为靶点超声纳米分子成像及靶向治疗心脏移植慢性排斥反应

国家自然科学基金

0+阅读 · 2014年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

叶酸靶向载TFPI-2表达质粒和顺铂磁性纳米复合物联合放疗治疗鼻咽癌的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

非常规突发事件情景的构建与动态推演方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

新型超声微泡介导靶向Survivin基因siRNA治疗原发性肝细胞癌

国家自然科学基金

0+阅读 · 2011年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多特征融合的G蛋白偶联特异性预测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Arxiv

0+阅读 · 2023年3月28日

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

Arxiv

0+阅读 · 2023年3月27日

Fairness-guided Few-shot Prompting for Large Language Models

Arxiv

0+阅读 · 2023年3月25日

An Analysis of GPT-3's Performance in Grammatical Error Correction

Arxiv

0+阅读 · 2023年3月25日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Scalable Graph Neural Networks via Bidirectional Propagation

Arxiv

16+阅读 · 2020年10月29日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

语言模型化

双向语言模型

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Arxiv

0+阅读 · 2023年3月28日

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

Arxiv

0+阅读 · 2023年3月27日

Fairness-guided Few-shot Prompting for Large Language Models

Arxiv

0+阅读 · 2023年3月25日

An Analysis of GPT-3's Performance in Grammatical Error Correction

Arxiv

0+阅读 · 2023年3月25日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Scalable Graph Neural Networks via Bidirectional Propagation

Arxiv

16+阅读 · 2020年10月29日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

以ED-A(+)Fn为靶点超声纳米分子成像及靶向治疗心脏移植慢性排斥反应

国家自然科学基金

0+阅读 · 2014年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

叶酸靶向载TFPI-2表达质粒和顺铂磁性纳米复合物联合放疗治疗鼻咽癌的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

非常规突发事件情景的构建与动态推演方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

新型超声微泡介导靶向Survivin基因siRNA治疗原发性肝细胞癌

国家自然科学基金

0+阅读 · 2011年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多特征融合的G蛋白偶联特异性预测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员