语言模式学习语法需要多少培训前数据? (How much pretraining data do language models need to learn syntax?) - 专知论文

会员服务 ·

0

Performer · 语言模型化 · MoDELS · Better · 泛化理论 ·

2021 年 9 月 9 日

How much pretraining data do language models need to learn syntax?

翻译：语言模式学习语法需要多少培训前数据?

Laura Pérez-Mayos,Miguel Ballesteros,Leo Wanner

from arxiv, To be published in proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)

Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.

翻译：在很多众所周知的NLU基准中,基于未经训练的变换人语言模型在许多众所周知的NLU基准中取得了杰出的成果。然而,尽管培训前方法非常方便,但在时间和资源方面费用昂贵。这就要求研究培训前数据规模对模型知识的影响。我们利用经过原材料数据增量规模培训的模型,探索对RoBERTA综合能力的影响。首先,我们使用综合结构探测来确定模型是否预先为更多数据编码了更多的合成信息。第二,我们进行了有针对性的综合评估,分析培训前数据规模对模型综合化绩效的影响。第三,我们比较了三个下游应用的不同模型的性能:局部标记、依赖性划分和语音识别。我们用对培训这些模型的成本利得权衡分析来补充我们的研究。我们的实验表明,虽然模型先于更多的数据编码,更多的合成知识,在下游应用方面表现得更好,但它们并非总能在不同的合成技术和环境成本上产生更好的业绩。

0

相关内容

Performer

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

专知会员服务

53+阅读 · 2020年4月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

专知会员服务

7+阅读 · 2019年11月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

How to Leverage Multimodal EHR Data for Better Medical Predictions?

Arxiv

0+阅读 · 2021年10月29日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Arxiv

3+阅读 · 2021年7月1日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

How Useful is Self-Supervised Pretraining for Visual Tasks?

How Useful is Self-Supervised Pretraining for Visual Tasks?

Arxiv

9+阅读 · 2020年3月31日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

Learning to Extract Coherent Summary via Deep Reinforcement Learning

Arxiv

6+阅读 · 2018年4月19日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

专知会员服务

53+阅读 · 2020年4月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

专知会员服务

7+阅读 · 2019年11月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

相关论文

How to Leverage Multimodal EHR Data for Better Medical Predictions?

Arxiv

0+阅读 · 2021年10月29日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Arxiv

3+阅读 · 2021年7月1日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

How Useful is Self-Supervised Pretraining for Visual Tasks?

How Useful is Self-Supervised Pretraining for Visual Tasks?

Arxiv

9+阅读 · 2020年3月31日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

Learning to Extract Coherent Summary via Deep Reinforcement Learning

Arxiv

6+阅读 · 2018年4月19日

微信扫码咨询专知VIP会员