具有可培训代表合议制的强化变形模型 (Sparsifying Transformer Models with Trainable Representation Pooling) - 专知论文

会员服务 ·

0

Transformer模型 · MoDELS · 变换 · Sparse Transformer · state-of-the-art ·

2021 年 2 月 4 日

Sparsifying Transformer Models with Trainable Representation Pooling

翻译：具有可培训代表合议制的强化变形模型

Michał Pietruszka,Łukasz Borchmann,Łukasz Garncarek

from arxiv, Provided formal overview. Reevaluated with Google Research script

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on task-specific parts of the input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-k operator. For example, our experiments on a challenging summarization task of long documents show that our method is over 3 times faster and up to 16 times more memory efficient while significantly outperforming both dense and state-of-the-art sparse transformer models. The method can be effortlessly applied to many models used in NLP and CV, simultaneously with other improvements.

翻译：我们提出一种新的方法,通过学习在培训过程中选择信息最丰富的象征性表示方式来分散变异器模型的注意力,从而集中关注投入中的任务特定部分。由于一个强大的可培训的顶级操作员,四边时间和内存的复杂性已经降低到亚线性。例如,我们对具有挑战性的长文件总结任务的实验表明,我们的方法比高3倍,记忆效率高16倍,同时大大超过密度和最先进的稀有变异器模型。这种方法可以不费力地应用于NLP和CV中的许多模型,同时进行其他改进。

0

相关内容

Transformer模型

Transformer模型

【斯坦福CS224N硬核课】Transformers模型详解，50页ppt

专知会员服务

61+阅读 · 2021年2月16日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

专知会员服务

79+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

BASE Layers: Simplifying Training of Large, Sparse Models

Arxiv

0+阅读 · 2021年3月30日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

Arxiv

3+阅读 · 2019年8月23日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

13+阅读 · 2018年6月26日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

Transformer模型

Sparse Transformer

state-of-the-art

相关VIP内容

【斯坦福CS224N硬核课】Transformers模型详解，50页ppt

专知会员服务

61+阅读 · 2021年2月16日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

专知会员服务

79+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

BASE Layers: Simplifying Training of Large, Sparse Models

Arxiv

0+阅读 · 2021年3月30日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Hierarchically-Refined Label Attention Network for Sequence Labeling

Hierarchically-Refined Label Attention Network for Sequence Labeling

Arxiv

3+阅读 · 2019年8月23日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

13+阅读 · 2018年6月26日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

微信扫码咨询专知VIP会员