与预先培训的变压器在语义拼法中进行整体化的构成 (Compositional generalization in semantic parsing with pretrained transformers) - 专知论文

会员服务 ·

0

泛化理论 · 语义分析 · SCAN · MoDELS · 知识 (knowledge) ·

2022 年 12 月 22 日

Compositional generalization in semantic parsing with pretrained transformers

翻译：与预先培训的变压器在语义拼法中进行整体化的构成

from arxiv, v3 adds one reference, adds further discussion, slightly changes formatting

Large-scale pretraining instills large amounts of knowledge in deep neural networks. This, in turn, improves the generalization behavior of these models in downstream tasks. What exactly are the limits to the generalization benefits of large-scale pretraining? Here, we report observations from some simple experiments aimed at addressing this question in the context of two semantic parsing tasks involving natural language, SCAN and COGS. We show that language models pretrained exclusively with non-English corpora, or even with programming language corpora, significantly improve out-of-distribution generalization in these benchmarks, compared with models trained from scratch, even though both benchmarks are English-based. This demonstrates the surprisingly broad transferability of pretrained representations and knowledge. Pretraining with a large-scale protein sequence prediction task, on the other hand, mostly deteriorates the generalization performance in SCAN and COGS, suggesting that pretrained representations do not transfer universally and that there are constraints on the similarity between the pretraining and downstream domains for successful transfer. Finally, we show that larger models are harder to train from scratch and their generalization accuracy is lower when trained up to convergence on the relatively small SCAN and COGS datasets, but the benefits of large-scale pretraining become much clearer with larger models.

翻译：在深层神经网络中,大规模培训前的大规模培训将积累大量知识。这反过来又能改善这些模型在下游任务中的一般化行为。对于大规模培训前的普及好处来说,究竟什么是限制?在这里,我们报告一些简单的实验的观察结果,这些实验的目的是在涉及自然语言、SCAN和COGS的两种语义分解任务中解决这一问题。我们表明,语言模型完全以非英语公司,甚至以编程语言公司进行训练,大大改进了这些基准在分配上的概括化,而从零到零的模型,尽管这两种基准都是以英语为基础的。这显示了预先培训的展示和知识具有惊人的广泛可转让性。另一方面,在进行大规模蛋白序列预测任务的培训前,多数会恶化SCAN和COGS的概括性表现,表明预先培训前的表述并不普遍转移,在培训前和下游领域之间对成功转让的相似性存在制约。最后,我们表明,较大型模型从零到培训后普遍化的精确性比较困难,而当经过培训后,在大规模培训前的模型上更趋同较明显地合并时,则比较低。

0

相关内容

泛化理论

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

双层钙钛矿氧化物BaLaM2O6-δ非常规轨道杂化诱导晶格畸变及其磁、电性质研究

国家自然科学基金

0+阅读 · 2016年12月31日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

去乙酰化酶SIRT1在血管内皮细胞氧化低密度脂蛋白代谢中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于室温固体氧化物燃料电池的超晶格电解质界面效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁性杂质在反铁磁晶格中的磁耦合和团簇化现象的研究

国家自然科学基金

0+阅读 · 2011年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

固体第一性原理计算中NESC方法的开发与应用

国家自然科学基金

0+阅读 · 2009年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

Learning Bidirectional Action-Language Translation with Limited Supervision and Incongruent Input

Arxiv

0+阅读 · 2023年2月22日

Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation

Arxiv

0+阅读 · 2023年2月22日

Inferring Implicit Trait Preferences for Task Allocation in Heterogeneous Teams

Arxiv

0+阅读 · 2023年2月21日

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Arxiv

0+阅读 · 2023年2月18日

Improving the Out-Of-Distribution Generalization Capability of Language Models: Counterfactually-Augmented Data is not Enough

Arxiv

0+阅读 · 2023年2月18日

SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision

Arxiv

0+阅读 · 2023年2月16日

On the Role of Memory in Robust Opinion Dynamics

Arxiv

0+阅读 · 2023年2月16日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年2月16日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

相关论文

Learning Bidirectional Action-Language Translation with Limited Supervision and Incongruent Input

Arxiv

0+阅读 · 2023年2月22日

Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation

Arxiv

0+阅读 · 2023年2月22日

Inferring Implicit Trait Preferences for Task Allocation in Heterogeneous Teams

Arxiv

0+阅读 · 2023年2月21日

Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Arxiv

0+阅读 · 2023年2月18日

Improving the Out-Of-Distribution Generalization Capability of Language Models: Counterfactually-Augmented Data is not Enough

Arxiv

0+阅读 · 2023年2月18日

SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision

Arxiv

0+阅读 · 2023年2月16日

On the Role of Memory in Robust Opinion Dynamics

Arxiv

0+阅读 · 2023年2月16日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年2月16日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

相关基金

双层钙钛矿氧化物BaLaM2O6-δ非常规轨道杂化诱导晶格畸变及其磁、电性质研究

国家自然科学基金

0+阅读 · 2016年12月31日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

去乙酰化酶SIRT1在血管内皮细胞氧化低密度脂蛋白代谢中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于室温固体氧化物燃料电池的超晶格电解质界面效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁性杂质在反铁磁晶格中的磁耦合和团簇化现象的研究

国家自然科学基金

0+阅读 · 2011年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

固体第一性原理计算中NESC方法的开发与应用

国家自然科学基金

0+阅读 · 2009年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员