变革者的托肯·普鲁宁(Token Prurning) (Learned Token Pruning for Transformers) - 专知论文

会员服务 ·

0

词元分析器 · 剪枝 · 学成 · 变换 · 阈值 ·

2021 年 7 月 2 日

Learned Token Pruning for Transformers

翻译：变革者的托肯·普鲁宁(Token Prurning)

Sehoon Kim,Sheng Shen,David Thorsley,Amir Gholami,Joseph Hassoun,Kurt Keutzer

A major challenge in deploying transformer models is their prohibitive inference cost, which quadratically scales with the input sequence length. This makes it especially difficult to use transformers for processing long sequences. To address this, we present a novel Learned Token Pruning (LTP) method that reduces redundant tokens as the data passes through the different layers of the transformer. In particular, LTP prunes tokens with an attention score below a threshold value, which is learned during training. Importantly, our threshold based method avoids algorithmically expensive operations such as top-k token selection which are used in prior token pruning methods, and also leads to structured pruning. We extensively test the performance of our approach on multiple GLUE tasks and show that our learned threshold based method consistently outperforms the prior state-of-the-art top-k token based method by up to ~2% higher accuracy with the same amount of FLOPs. Furthermore, our preliminary results show up to 1.4x and 1.9x throughput improvement on Tesla T4 GPU and Intel Haswell CPU, respectively, with less than 1% of accuracy drop (and up to 2.1x FLOPs reduction). Our code has been developed in PyTorch and has been open-sourced.

翻译：在部署变压器模型方面,一个重大挑战是其令人望而却步的推导成本,这种推导成本随输入序列长度的跨度而不同。这使得使用变压器处理长序列特别困难。为了解决这个问题,我们提出了一个小说Token Prurning (LTP) 方法,该方法随着数据通过变压器的不同层而减少多余的标牌。特别是,LTP Prunes 标记,其关注分数低于阈值,这是在培训期间学到的。重要的是,我们基于阈值的方法避免了在算法上昂贵的操作,如在前代代代托盘处理方法中使用的顶级标牌选择,并导致结构化的裁剪裁。我们广泛测试了我们在多个 GLUE 任务上的方法的性能,并表明我们所学的门槛值始终高于先前的状态顶级标牌,其精度比限值要高出2%,与培训期间所学的数值相同。此外,我们的初步结果显示,Tesla T4 GPU 和 Intel Haswell CPU 和 Intel CPU, 已经分别开发了1.x 和FPRx 降低了1.x 。

0

相关内容

词元分析器

词元分析器

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

专知会员服务

109+阅读 · 2021年8月13日

【CVPR2021】预训练图像处理Transformer

专知会员服务

45+阅读 · 2021年6月1日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【新书】Pro 机器学习算法Python实现，379页pdf

【新书】Pro 机器学习算法Python实现，379页pdf

专知会员服务

204+阅读 · 2020年2月11日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Transformer中的相对位置编码

Transformer中的相对位置编码

AINLP

5+阅读 · 2020年11月28日

pytorch中文语言模型bert预训练代码

pytorch中文语言模型bert预训练代码

AINLP

3+阅读 · 2020年7月22日

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

专知

17+阅读 · 2019年12月7日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

Voxel Transformer for 3D Object Detection

Voxel Transformer for 3D Object Detection

Arxiv

1+阅读 · 2021年9月6日

Transformer Models for Text Coherence Assessment

Arxiv

0+阅读 · 2021年9月5日

Model of the Weak Reset Process in HfOx Resistive Memory for Deep Learning Frameworks

Arxiv

0+阅读 · 2021年9月2日

Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Arxiv

1+阅读 · 2021年9月2日

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Arxiv

4+阅读 · 2021年2月22日

Graph Transformer Networks

Arxiv

15+阅读 · 2020年2月5日

Graph Transformer for Graph-to-Sequence Learning

Graph Transformer for Graph-to-Sequence Learning

Arxiv

4+阅读 · 2019年11月30日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

Arxiv

3+阅读 · 2019年6月5日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

专知会员服务

109+阅读 · 2021年8月13日

【CVPR2021】预训练图像处理Transformer

专知会员服务

45+阅读 · 2021年6月1日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

【新书】Pro 机器学习算法Python实现，379页pdf

【新书】Pro 机器学习算法Python实现，379页pdf

专知会员服务

204+阅读 · 2020年2月11日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Transformer中的相对位置编码

Transformer中的相对位置编码

AINLP

5+阅读 · 2020年11月28日

pytorch中文语言模型bert预训练代码

pytorch中文语言模型bert预训练代码

AINLP

3+阅读 · 2020年7月22日

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

专知

17+阅读 · 2019年12月7日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

相关论文

Voxel Transformer for 3D Object Detection

Voxel Transformer for 3D Object Detection

Arxiv

1+阅读 · 2021年9月6日

Transformer Models for Text Coherence Assessment

Arxiv

0+阅读 · 2021年9月5日

Model of the Weak Reset Process in HfOx Resistive Memory for Deep Learning Frameworks

Arxiv

0+阅读 · 2021年9月2日

Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Arxiv

1+阅读 · 2021年9月2日

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Arxiv

4+阅读 · 2021年2月22日

Graph Transformer Networks

Arxiv

15+阅读 · 2020年2月5日

Graph Transformer for Graph-to-Sequence Learning

Graph Transformer for Graph-to-Sequence Learning

Arxiv

4+阅读 · 2019年11月30日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

Arxiv

3+阅读 · 2019年6月5日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

微信扫码咨询专知VIP会员