Transkimer: 变换者学习到图层滑雪 (Transkimmer: Transformer Learns to Layer-wise Skim) - 专知论文

会员服务 ·

0

学成 · 预测器/决策函数 · 变换 · 词元分析器 · 再参数化/重参数化 ·

2022 年 5 月 15 日

Transkimmer: Transformer Learns to Layer-wise Skim

翻译：Transkimer: 变换者学习到图层滑雪

Yue Guan,Zhengyi Li,Jingwen Leng,Zhouhan Lin,Minyi Guo

from arxiv, Published as a conference paper at ACL 2022

Transformer architecture has become the de-facto model for many machine learning tasks from natural language processing and computer vision. As such, improving its computational efficiency becomes paramount. One of the major computational inefficiency of Transformer-based models is that they spend the identical amount of computation throughout all layers. Prior works have proposed to augment the Transformer model with the capability of skimming tokens to improve its computational efficiency. However, they suffer from not having effectual and end-to-end optimization of the discrete skimming predictor. To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer. The skimmed tokens are then forwarded directly to the final output, thus reducing the computation of the successive layers. The key idea in Transkimmer is to add a parameterized predictor before each layer that learns to make the skimming decision. We also propose to adopt reparameterization trick and add skim loss for the end-to-end training of Transkimmer. Transkimmer achieves 10.97x average speedup on GLUE benchmark compared with vanilla BERT-base baseline with less than 1% accuracy degradation.

翻译：变换器结构已经成为许多自然语言处理和计算机视觉的机器学习任务的脱法模型。因此, 提高它的计算效率至关重要。以变换器为基础的模型的主要计算效率之一是它们在所有层次上花费相同的计算量。先前的工程提议, 将变换器模型扩大, 使其具有滑动符号的能力, 以提高其计算效率。但是, 它们因离散的滑动预测器没有效果和端到端的优化而受到影响。为了应对上述限制, 我们提议了 Transkimmer 结构, 它学会识别每个层次都不需要的隐藏状态符号。以变换器为基础的模型随后直接传送到最后输出, 从而减少连续层的计算。 Transkimmer 的关键思想是在学习滑动决定的每个层次之前添加一个参数化的预测器。我们还提议, 在 Transkimmer 的端到端培训中采用重新计法, 并增加滑动器损失。 Transkimmer 实现GLUE基准的10.97x平均速度, 比vanLE- basil 更低的精确度基准。

0

相关内容

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

长链非编码RNA-LSINCT5在胃癌中的生物学功能及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

雌激素反应性miR-130α通过靶向下调ERα参与子宫内膜癌恶性生物学行为

国家自然科学基金

0+阅读 · 2013年12月31日

二维FIR数字滤波器优化设计的二维优化算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

密码协议和算法若干问题研究:理论及应用

国家自然科学基金

0+阅读 · 2012年12月31日

降低甲酸在Pd催化剂上分解的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的(k,d)*-染色及相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

独立成分分析法在功能磁共振成像数据分析中的若干问题研究

国家自然科学基金

0+阅读 · 2008年12月31日

Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series

Arxiv

0+阅读 · 2022年7月6日

EfficientFormer: Vision Transformers at MobileNet Speed

Arxiv

0+阅读 · 2022年7月5日

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Arxiv

0+阅读 · 2022年7月4日

Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures

Arxiv

0+阅读 · 2022年7月4日

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

Arxiv

0+阅读 · 2022年7月1日

Rethinking Optimization with Differentiable Simulation from a Global Perspective

Arxiv

0+阅读 · 2022年6月28日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Efficient Transformers: A Survey

Arxiv

35+阅读 · 2022年3月14日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

VIP会员

文章信息

相关主题

预测器/决策函数

词元分析器

再参数化/重参数化

相关VIP内容

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】面向视觉、物理与语言应用的可信机器学习模型

医学领域大型语言模型的新进展

战场AI决策支持系统

【NeurIPS 2025】视觉指令瓶颈微调

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

相关论文

Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series

Arxiv

0+阅读 · 2022年7月6日

EfficientFormer: Vision Transformers at MobileNet Speed

Arxiv

0+阅读 · 2022年7月5日

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Arxiv

0+阅读 · 2022年7月4日

Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures

Arxiv

0+阅读 · 2022年7月4日

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

Arxiv

0+阅读 · 2022年7月1日

Rethinking Optimization with Differentiable Simulation from a Global Perspective

Arxiv

0+阅读 · 2022年6月28日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Efficient Transformers: A Survey

Arxiv

35+阅读 · 2022年3月14日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

相关基金

长链非编码RNA-LSINCT5在胃癌中的生物学功能及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

雌激素反应性miR-130α通过靶向下调ERα参与子宫内膜癌恶性生物学行为

国家自然科学基金

0+阅读 · 2013年12月31日

二维FIR数字滤波器优化设计的二维优化算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

密码协议和算法若干问题研究:理论及应用

国家自然科学基金

0+阅读 · 2012年12月31日

降低甲酸在Pd催化剂上分解的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的(k,d)*-染色及相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

独立成分分析法在功能磁共振成像数据分析中的若干问题研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员