CoLT5: 有条件计算加速长距离Transformer (CoLT5: Faster Long-Range Transformers with Conditional Computation) - 专知论文

会员服务 ·

0

变换 · 条件计算 · 前馈 · 词元分析器 · Processing（编程语言） ·

2023 年 3 月 17 日

CoLT5: Faster Long-Range Transformers with Conditional Computation

翻译：CoLT5: 有条件计算加速长距离Transformer

Joshua Ainslie,Tao Lei,Michiel de Jong,Santiago Ontañón,Siddhartha Brahma,Yury Zemlyanskiy,David Uthus,Mandy Guo,James Lee-Thorp,Yi Tay,Yun-Hsuan Sung,Sumit Sanghai

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.

翻译：许多自然语言处理任务需要处理长文本，但是使用Transformer处理长文本的代价很高——不仅由于二次的自注意力的复杂度，还由于需要对每个标记应用前馈和投影层。然而，并不是所有标记都同样重要，特别是对于更长的文档。我们提出了CoLT5，一种长输入Transformer模型，利用条件计算这种直觉，在前馈和自注意力层中更多地使用重要标记。我们展示了CoLT5比LongT5表现更好，训练和推理速度更快，在长文本SCROLLS基准测试中取得了SOTA。此外，CoLT5可以有效且可缩放地使用极长的输入，展现出了在64k输入长度上的强大优势。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

108+阅读 · 2020年8月30日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

基于图形处理单元的高性能网络包处理技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

GPU加速密度泛函计算研究甲烷氧化偶联反应催化循环机制

国家自然科学基金

1+阅读 · 2014年12月31日

Prp19诱导上皮间质转化促进肝癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属中空位-氢团簇电荷转移应力失稳机制及相关现象研究

国家自然科学基金

0+阅读 · 2014年12月31日

多π电子体系中的阳离子-π相互作用机制的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

光纤集成一体化SPR传感系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

S1P联合PR-MSCs移植在治疗小鼠急性心肌梗死中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

间断数值摄动算法及NS方程高精度中心有限体积格式与应用

国家自然科学基金

0+阅读 · 2012年12月31日

关于Hamilton系统的边值解问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Tensor Networks Meet Neural Networks: A Survey and Future Perspectives

Arxiv

0+阅读 · 2023年5月8日

Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Arxiv

0+阅读 · 2023年5月6日

White-Box Multi-Objective Adversarial Attack on Dialogue Generation

Arxiv

0+阅读 · 2023年5月5日

Sparsifying Bayesian neural networks with latent binary variables and normalizing flows

Arxiv

0+阅读 · 2023年5月5日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Trustworthy AI: A Computational Perspective

Arxiv

12+阅读 · 2021年8月19日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

词元分析器

Processing（编程语言）

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

108+阅读 · 2020年8月30日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美空军指挥参谋学院 · 联合空中作战规划课程介绍（2025年） | 22页

【普林斯顿博士论文】在线学习：优化、控制与学习理论

北约第十七届（2025年）网络冲突国际会议论文集 | 272页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

相关资讯

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

相关论文

Tensor Networks Meet Neural Networks: A Survey and Future Perspectives

Arxiv

0+阅读 · 2023年5月8日

Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Arxiv

0+阅读 · 2023年5月6日

White-Box Multi-Objective Adversarial Attack on Dialogue Generation

Arxiv

0+阅读 · 2023年5月5日

Sparsifying Bayesian neural networks with latent binary variables and normalizing flows

Arxiv

0+阅读 · 2023年5月5日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Trustworthy AI: A Computational Perspective

Arxiv

12+阅读 · 2021年8月19日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

相关基金

基于图形处理单元的高性能网络包处理技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

GPU加速密度泛函计算研究甲烷氧化偶联反应催化循环机制

国家自然科学基金

1+阅读 · 2014年12月31日

Prp19诱导上皮间质转化促进肝癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属中空位-氢团簇电荷转移应力失稳机制及相关现象研究

国家自然科学基金

0+阅读 · 2014年12月31日

多π电子体系中的阳离子-π相互作用机制的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

光纤集成一体化SPR传感系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

S1P联合PR-MSCs移植在治疗小鼠急性心肌梗死中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

RI与Angiogenin相互作用调控PI3K/AKT/mTOR信号通路和ANG的核转位在膀胱癌发生发展中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

间断数值摄动算法及NS方程高精度中心有限体积格式与应用

国家自然科学基金

0+阅读 · 2012年12月31日

关于Hamilton系统的边值解问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员