改善基于自回归的 NLP 任务：模块化线性化注意力的方法 (Improving Autoregressive NLP Tasks via Modular Linearized Attention) - 专知论文

会员服务 ·

0

线性化 · cosFormer · NMT · NLP · 语谱 ·

2023 年 4 月 17 日

Improving Autoregressive NLP Tasks via Modular Linearized Attention

翻译：改善基于自回归的 NLP 任务：模块化线性化注意力的方法

Victor Agostinelli,Lizhong Chen

Various natural language processing (NLP) tasks necessitate models that are efficient and small based on their ultimate application at the edge or in other resource-constrained environments. While prior research has reduced the size of these models, increasing computational efficiency without considerable performance impacts remains difficult, especially for autoregressive tasks. This paper proposes \textit{modular linearized attention (MLA)}, which combines multiple efficient attention mechanisms, including cosFormer \cite{zhen2022cosformer}, to maximize inference quality while achieving notable speedups. We validate this approach on several autoregressive NLP tasks, including speech-to-text neural machine translation (S2T NMT), speech-to-text simultaneous translation (SimulST), and autoregressive text-to-spectrogram, noting efficiency gains on TTS and competitive performance for NMT and SimulST during training and inference.

翻译：各种自然语言处理（NLP）任务都需要高效且适用于边缘计算或其他资源受限环境的模型。尽管早期的研究已经减小了这些模型的大小，但在不对性能产生明显影响的情况下提高计算效率仍然很难，特别是对于自回归任务。本文提出了“模块化线性化注意力（MLA）”方法，它结合了多个高效的注意力机制，包括 cosFormer \cite{zhen2022cosformer}，以最大化推理质量同时实现显著的加速。我们在多个基于自回归的 NLP 任务上验证了这种方法，包括语音到文本的神经机器翻译（S2T NMT）、语音到文本的同声传译（SimulST）和自回归文本到语谱图，注意到在 TTS 上的效率提升以及在训练和推理过程中 NMT 和 SimulST 的竞争性能。

0

相关内容

线性化

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

45+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

自然语言处理中注意力机制综述

自然语言处理中注意力机制综述

Python开发者

11+阅读 · 2019年1月31日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

面向浅水波大气动力方程求解器的可重构计算方法研究

国家自然科学基金

0+阅读 · 2017年12月31日

分数阶薛定谔方程的数值方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

靶向性调节性ODNR01减轻心肌缺血再灌注损伤及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

大气甲烷浓度的控制因子及对大气氧化剂的影响

国家自然科学基金

0+阅读 · 2014年12月31日

Criegee自由基大气化学反应机理与二次有机气溶胶形成

国家自然科学基金

0+阅读 · 2014年12月31日

土地利用结构及其变化与流动人口关系实证研究

国家自然科学基金

0+阅读 · 2013年12月31日

IMPDH为靶点的小分子抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进S变换自适应算法与电能质量检测及扰动信号特征提取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于非局部平均方法的图像复原研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于多目标进化算法的内建自测试（BIST）优化设计技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

GRES: Generalized Referring Expression Segmentation

Arxiv

0+阅读 · 2023年6月1日

Boosting the Performance of Transformer Architectures for Semantic Textual Similarity

Arxiv

0+阅读 · 2023年6月1日

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

Arxiv

0+阅读 · 2023年5月31日

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

Arxiv

0+阅读 · 2023年5月31日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

45+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

热门VIP内容

开通专知VIP会员享更多权益服务

未来战场：AI赋能无人作战新范式，39页ppt

【牛津博士论文】无限维空间中的广义变分推断

DeepSeek AI 从入门到付费专家·第一卷：动手实践、真实应用与可扩展 AI 解决方案全掌握

2025中国AI Agent商业应用场景洞察研究

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

自然语言处理中注意力机制综述

自然语言处理中注意力机制综述

Python开发者

11+阅读 · 2019年1月31日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

相关论文

GRES: Generalized Referring Expression Segmentation

Arxiv

0+阅读 · 2023年6月1日

Boosting the Performance of Transformer Architectures for Semantic Textual Similarity

Arxiv

0+阅读 · 2023年6月1日

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

Arxiv

0+阅读 · 2023年5月31日

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

Arxiv

0+阅读 · 2023年5月31日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

相关基金

面向浅水波大气动力方程求解器的可重构计算方法研究

国家自然科学基金

0+阅读 · 2017年12月31日

分数阶薛定谔方程的数值方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

靶向性调节性ODNR01减轻心肌缺血再灌注损伤及其机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

大气甲烷浓度的控制因子及对大气氧化剂的影响

国家自然科学基金

0+阅读 · 2014年12月31日

Criegee自由基大气化学反应机理与二次有机气溶胶形成

国家自然科学基金

0+阅读 · 2014年12月31日

土地利用结构及其变化与流动人口关系实证研究

国家自然科学基金

0+阅读 · 2013年12月31日

IMPDH为靶点的小分子抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进S变换自适应算法与电能质量检测及扰动信号特征提取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于非局部平均方法的图像复原研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于多目标进化算法的内建自测试（BIST）优化设计技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员