The Diminishing Returns of Masked Language Models to Science - 专知论文

会员服务 ·

0

掩码语言模型化 · 语言模型化 · MoDELS · Performer · 掩码 ·

2023 年 5 月 3 日

The Diminishing Returns of Masked Language Models to Science

翻译：暂无翻译

Zhi Hong,Aswathy Ajith,Gregory Pauloski,Eamon Duede,Kyle Chard,Ian Foster

from arxiv, 12 pages. 3 figures. 5 tables. Accepted to the Findings of ACL 2023

Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by pretraining larger models for longer on more data. In this work, we empirically evaluate the extent to which these results extend to tasks in science. We use 14 domain-specific transformer-based models (including ScholarBERT, a new 770M-parameter science-focused masked language model pretrained on up to 225B tokens) to evaluate the impact of training data, model size, pretraining and finetuning time on 12 downstream scientific tasks. Interestingly, we find that increasing model sizes, training data, or compute time does not always lead to significant improvements (i.e., >1% F1), if at all, in scientific information extraction tasks and offered possible explanations for the surprising performance differences.

翻译：暂无翻译

0

相关内容

掩码语言模型化

掩码语言模型化

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

85+阅读 · 2023年6月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

《数学学报》期刊

国家自然科学基金

5+阅读 · 2015年12月31日

表面增强拉曼旋光（SEROA）光谱研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

高能重离子辐照高压条件下地质材料的相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

噪声激励下时滞网络系统的随机动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维随机覆盖问题及其在动力系统中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

薛定谔方程中的稳定现象

国家自然科学基金

0+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

Elmo1-Nck 的相互作用在肝细胞癌侵袭和转移中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

PGK1诱导肿瘤基质成纤维细胞激活在前列腺癌发展、转移中作用的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Friend or Foe? Exploring the Implications of Large Language Models on the Science System

Arxiv

0+阅读 · 2023年6月16日

Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue System

Arxiv

0+阅读 · 2023年6月16日

Pushing the Limits of ChatGPT on NLP Tasks

Arxiv

0+阅读 · 2023年6月16日

Evaluating Data Attribution for Text-to-Image Models

Arxiv

0+阅读 · 2023年6月15日

Modernising the Design and Analysis of Prevalence Surveys for Neglected Tropical Diseases

Arxiv

0+阅读 · 2023年6月14日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年6月14日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

85+阅读 · 2023年6月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向开放式世界的鲁棒智能体

美空军如何利用人工智能提升其兵棋推演能力

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

深度强化学习与模仿学习导论

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Friend or Foe? Exploring the Implications of Large Language Models on the Science System

Arxiv

0+阅读 · 2023年6月16日

Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue System

Arxiv

0+阅读 · 2023年6月16日

Pushing the Limits of ChatGPT on NLP Tasks

Arxiv

0+阅读 · 2023年6月16日

Evaluating Data Attribution for Text-to-Image Models

Arxiv

0+阅读 · 2023年6月15日

Modernising the Design and Analysis of Prevalence Surveys for Neglected Tropical Diseases

Arxiv

0+阅读 · 2023年6月14日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年6月14日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

相关基金

《数学学报》期刊

国家自然科学基金

5+阅读 · 2015年12月31日

表面增强拉曼旋光（SEROA）光谱研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

高能重离子辐照高压条件下地质材料的相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

噪声激励下时滞网络系统的随机动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维随机覆盖问题及其在动力系统中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

薛定谔方程中的稳定现象

国家自然科学基金

0+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

Elmo1-Nck 的相互作用在肝细胞癌侵袭和转移中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

PGK1诱导肿瘤基质成纤维细胞激活在前列腺癌发展、转移中作用的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员