变革者的托肯·普鲁宁(Token Prurning) (Learned Token Pruning for Transformers) - 专知论文

会员服务 ·

0

剪枝 · 词元分析器 · 变换 · Extensibility · 模型评估 ·

2021 年 9 月 23 日

Learned Token Pruning for Transformers

翻译：变革者的托肯·普鲁宁(Token Prurning)

Sehoon Kim,Sheng Shen,David Thorsley,Amir Gholami,Woosuk Kwon,Joseph Hassoun,Kurt Keutzer

Deploying transformer models in practice is challenging due to their inference cost, which scales quadratically with input sequence length. To address this, we present a novel Learned Token Pruning (LTP) method which adaptively removes unimportant tokens as an input sequence passes through transformer layers. In particular, LTP prunes tokens with an attention score below a threshold value which is learned for each layer during training. Our threshold-based method allows the length of the pruned sequence to vary adaptively based on the input sequence, and avoids algorithmically expensive operations such as top-k token selection. We extensively test the performance of LTP on GLUE tasks and show that our method outperforms the prior state-of-the-art token pruning methods by up to ~2.5% higher accuracy with the same amount of FLOPs. In particular, LTP achieves up to 2.1x FLOPs reduction with less than 1% accuracy drop, which results in up to 1.9x and 2.0x throughput improvement on Intel Haswell CPUs and NVIDIA V100 GPUs, respectively. Furthermore, we demonstrate that LTP is more robust than prior methods to variations on input sentence lengths. Our code has been developed in PyTorch and has been open-sourced.

翻译：实际部署变压器模型具有挑战性, 原因是其推论成本, 以输入序列长度缩放。为了解决这个问题, 我们展示了一部新颖的《 Token Prurning (LTP) 》方法, 该方法通过变压器层, 以适应方式删除输入序列中的不重要符号。特别是, LTP 光标, 其关注分数低于在训练期间为每一层所学的阈值。我们的阈值方法允许经调整的序列长度根据输入序列进行适应性变异, 并避免了诸如顶级符号选择等费用昂贵的算法操作。我们在 GLUE 任务上广泛测试了 LTP 的性能, 并显示我们的方法比先前的状态、高级符号处理方法高出2.5 % 。特别是, LTP 达到2.1x FLOPs 的降幅, 低于1% 的精度下降, 从而导致在 Intel Haswell 和 NVIIA V100 GPUPS 上进行价格的改进。此外, 我们的LTP 和前期输入法的变更坚固。

0

相关内容

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

注意力机制综述

注意力机制综述

专知会员服务

208+阅读 · 2021年1月26日

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

专知会员服务

113+阅读 · 2020年9月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

绝对干货！NLP预训练模型：从transformer到albert

绝对干货！NLP预训练模型：从transformer到albert

新智元

13+阅读 · 2019年11月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

干货 | BERT fine-tune 终极实践教程

干货 | BERT fine-tune 终极实践教程

AINLP

40+阅读 · 2018年11月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

韩国小哥哥用Pytorch实现谷歌最强NLP预训练模型BERT | 代码

韩国小哥哥用Pytorch实现谷歌最强NLP预训练模型BERT | 代码

量子位

8+阅读 · 2018年10月19日

Code Representation Learning with Prüfer Sequences

Arxiv

0+阅读 · 2021年11月14日

Transformer in Transformer

Arxiv

11+阅读 · 2021年10月26日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text

Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text

Arxiv

6+阅读 · 2019年8月21日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

注意力机制综述

注意力机制综述

专知会员服务

208+阅读 · 2021年1月26日

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

专知会员服务

113+阅读 · 2020年9月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

绝对干货！NLP预训练模型：从transformer到albert

绝对干货！NLP预训练模型：从transformer到albert

新智元

13+阅读 · 2019年11月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

干货 | BERT fine-tune 终极实践教程

干货 | BERT fine-tune 终极实践教程

AINLP

40+阅读 · 2018年11月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

韩国小哥哥用Pytorch实现谷歌最强NLP预训练模型BERT | 代码

韩国小哥哥用Pytorch实现谷歌最强NLP预训练模型BERT | 代码

量子位

8+阅读 · 2018年10月19日

相关论文

Code Representation Learning with Prüfer Sequences

Arxiv

0+阅读 · 2021年11月14日

Transformer in Transformer

Arxiv

11+阅读 · 2021年10月26日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text

Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text

Arxiv

6+阅读 · 2019年8月21日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Arxiv

5+阅读 · 2019年5月20日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

微信扫码咨询专知VIP会员