无人注意的训练前训练 (Pretraining Without Attention) - 专知论文

会员服务 ·

0

Attention · 模型评估 · MoDELS · 层 · 门控 ·

2022 年 12 月 20 日

Pretraining Without Attention

翻译：无人注意的训练前训练

Junxiong Wang,Jing Nathan Yan,Albert Gu,Alexander M. Rush

Transformers have been essential to pretraining success in NLP. Other architectures have been used, but require attention layers to match benchmark accuracy. This work explores pretraining without attention. We test recently developed routing layers based on state-space models (SSM) and model architectures based on multiplicative gating. Used together these modeling choices have a large impact on pretraining accuracy. Empirically the proposed Bidirectional Gated SSM (BiGS) replicates BERT pretraining results without attention and can be extended to long-form pretraining of 4096 tokens without approximation.

翻译：在NLP中,对培训前的成功来说,转换器至关重要。其他结构已经使用过,但需要注意层次来匹配基准的准确性。这项工作探索了培训前的精确性,而没有引起注意。我们根据州空间模型和基于多倍化标志的模型结构,测试了最近开发的路线图层。这些模型选择一起使用对培训前的准确性有很大影响。拟议的双向Gate SSM(BIGS)在不引起注意的情况下复制了BERT培训前的结果,并且可以扩大到4096个没有近似标志的长式预培训。

0

相关内容

Attention

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

反Prelog规则羰基还原酶立体选择性识别分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

LncRNA-TC0101441抑制KiSS-1促进卵巢癌侵袭转移的作用及分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

适应生态服务价值要求的水库多尺度调度模型及其优化算法

国家自然科学基金

0+阅读 · 2014年12月31日

De Brujin图和Kautz图的交叉数算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

高速铁路XCC桩-板结构承载特性与变形机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

考虑返工迭代的复杂系统研发项目流程优化、风险评估与鲁棒调度集成研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

A General Visual Representation Guided Framework with Global Affinity for Weakly Supervised Salient Object Detection

Arxiv

0+阅读 · 2023年2月21日

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

Arxiv

0+阅读 · 2023年2月21日

Contrastive Trajectory Similarity Learning with Dual-Feature Attention

Arxiv

0+阅读 · 2023年2月20日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年2月16日

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Arxiv

20+阅读 · 2023年2月1日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

A General Visual Representation Guided Framework with Global Affinity for Weakly Supervised Salient Object Detection

Arxiv

0+阅读 · 2023年2月21日

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

Arxiv

0+阅读 · 2023年2月21日

Contrastive Trajectory Similarity Learning with Dual-Feature Attention

Arxiv

0+阅读 · 2023年2月20日

Pretraining Language Models with Human Preferences

Arxiv

0+阅读 · 2023年2月16日

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Arxiv

20+阅读 · 2023年2月1日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

反Prelog规则羰基还原酶立体选择性识别分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

LncRNA-TC0101441抑制KiSS-1促进卵巢癌侵袭转移的作用及分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

适应生态服务价值要求的水库多尺度调度模型及其优化算法

国家自然科学基金

0+阅读 · 2014年12月31日

De Brujin图和Kautz图的交叉数算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

高速铁路XCC桩-板结构承载特性与变形机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

考虑返工迭代的复杂系统研发项目流程优化、风险评估与鲁棒调度集成研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员