训练前的革命是否比训练前的变革者好? (Are Pre-trained Convolutions Better than Pre-trained Transformers?) - 专知论文

会员服务 ·

0

Better · MoDELS · Extensibility · 变换 · 卷积 ·

2022 年 1 月 30 日

Are Pre-trained Convolutions Better than Pre-trained Transformers?

翻译：训练前的革命是否比训练前的变革者好?

Yi Tay,Mostafa Dehghani,Jai Gupta,Dara Bahri,Vamsi Aribandi,Zhen Qin,Donald Metzler

from arxiv, ACL'21 + updated code/ckpt pointers

In the era of pre-trained language models, Transformers are the de facto choice of model architectures. While recent research has shown promise in entirely convolutional, or CNN, architectures, they have not been explored using the pre-train-fine-tune paradigm. In the context of language models, are convolutional models competitive to Transformers when pre-trained? This paper investigates this research question and presents several interesting findings. Across an extensive set of experiments on 8 datasets/tasks, we find that CNN-based pre-trained models are competitive and outperform their Transformer counterpart in certain scenarios, albeit with caveats. Overall, the findings outlined in this paper suggest that conflating pre-training and architectural advances is misguided and that both advances should be considered independently. We believe our research paves the way for a healthy amount of optimism in alternative architectures.

翻译：在经过培训的语文模型时代,变异器是模型结构的实际选择。虽然最近的研究表明在完全革命性的或有线电视新闻网的结构中显示出了希望,但还没有利用培训前的节奏范式加以探索。在语言模型方面,在经过培训前,变异器对变异器具有竞争力?本文调查了这个研究问题并提出若干有趣的结论。在对8个数据集/任务进行的广泛试验中,我们发现有线电视新闻网的预先训练模型在某些情景中具有竞争力,并且在某些情景中优于变异器的对应方,尽管有警告。总体而言,本文概述的调查结果表明,培训前和建筑进步的叠加是误导的,两种进步都应该独立考虑。我们认为,我们的研究为替代建筑中健康的乐观度铺平了道路。

0

相关内容

Better

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

融合多源图像与光流运动的旋转背景下对地运动目标检测研究

国家自然科学基金

2+阅读 · 2014年12月31日

混沌压缩感知关键基础理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

机场复杂环境下特种车辆精确定位及调度问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

集成纳米结构的光源与多敏感元一体化的红外气体传感器

国家自然科学基金

0+阅读 · 2013年12月31日

基于视神经细胞模型的复杂环境感知与定位

国家自然科学基金

0+阅读 · 2012年12月31日

多自由度柔性关节机器人奇异摄动控制若干问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信号感知的移动节点安全定位与目标跟踪技术研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于语意多尺度马尔可夫模型的高分辨率遥感影像分割

国家自然科学基金

1+阅读 · 2009年12月31日

野生柑橘与栽培柑橘上衰退病毒株系比对分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于 EC-SMC-MC共培养体系的参莲提取物防治AS作用评价及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

Dynamic Position Encoding for Transformers

Arxiv

1+阅读 · 2022年4月18日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Arxiv

0+阅读 · 2022年4月16日

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Arxiv

0+阅读 · 2022年4月15日

Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts

Arxiv

0+阅读 · 2022年4月15日

How Different are Pre-trained Transformers for Text Ranking?

Arxiv

0+阅读 · 2022年4月5日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

Dynamic Position Encoding for Transformers

Arxiv

1+阅读 · 2022年4月18日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Arxiv

0+阅读 · 2022年4月16日

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Arxiv

0+阅读 · 2022年4月15日

Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts

Arxiv

0+阅读 · 2022年4月15日

How Different are Pre-trained Transformers for Text Ranking?

Arxiv

0+阅读 · 2022年4月5日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

融合多源图像与光流运动的旋转背景下对地运动目标检测研究

国家自然科学基金

2+阅读 · 2014年12月31日

混沌压缩感知关键基础理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

机场复杂环境下特种车辆精确定位及调度问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

集成纳米结构的光源与多敏感元一体化的红外气体传感器

国家自然科学基金

0+阅读 · 2013年12月31日

基于视神经细胞模型的复杂环境感知与定位

国家自然科学基金

0+阅读 · 2012年12月31日

多自由度柔性关节机器人奇异摄动控制若干问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信号感知的移动节点安全定位与目标跟踪技术研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于语意多尺度马尔可夫模型的高分辨率遥感影像分割

国家自然科学基金

1+阅读 · 2009年12月31日

野生柑橘与栽培柑橘上衰退病毒株系比对分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于 EC-SMC-MC共培养体系的参莲提取物防治AS作用评价及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员