ABC: 注意防守模拟控制 (ABC: Attention with Bounded-memory Control) - 专知论文

会员服务 ·

0

Attention · 掩码语言模型化 · 控制器 · MoDELS · 语言模型化 ·

2022 年 6 月 1 日

ABC: Attention with Bounded-memory Control

翻译：ABC: 注意防守模拟控制

Hao Peng,Jungo Kasai,Nikolaos Pappas,Dani Yogatama,Zhaofeng Wu,Lingpeng Kong,Roy Schwartz,Noah A. Smith

Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size grows linearly with the sequence length, and so does the overhead of reading from it. One way to improve the efficiency is to bound the memory size. We show that disparate approaches can be subsumed into one abstraction, attention with bounded-memory control (ABC), and they vary in their organization of the memory. ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem apart. Second, this abstraction gives new insights--an established approach (Wang et al., 2020b) previously thought to be not applicable in causal attention, actually is. Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their heuristic memory-organizing functions with a learned, contextualized one. Our experiments on language modeling, machine translation, and masked language model finetuning show that our approach outperforms previous efficient attention models; compared to the strong transformer baselines, it significantly improves the inference time and space efficiency with no or negligible accuracy loss.

翻译：变形器结构在一系列序列建模任务中取得了最先进的结果。然而, 它们的注意机制在序列长度上带有四重复杂度, 使得计算高转盘的难度很高, 特别是对于长序列来说。注意环境可以被视为随机的存取存储器, 每一个符号取一个槽。在这个角度下, 内存的大小随着序列长度的长度而线性地增长, 从中读取的间接费用也是这样。提高效率的方法之一是将内存的大小捆绑起来。我们显示, 差异性的方法可以包含在一个抽象的、受约束的摩擦控制(ABC) 的注意, 它们在记忆的组织结构中也各不相同。 ABC 显示, ABC 显示的是新的、新的、尚未探索的可能性。首先, 它把一些有效的关注变量连接在一起, 而不是随机的。其次, 这个抽象的、给新的、固定的方法( Wang et al., 2020b) 和从它读到因果关系上的注意, 事实上, 我们展示了一个新的ABC 实例, 它从现有的ABC 方法中汲取了灵感, 但是取代了他们的记忆- mind- im- mort- mort- changinginginginginging lical lavel laction maildolfortical laction mading lactioning coming coming coming coming mainal laus views

0

相关内容

Attention

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Tisp40在肾缺血再灌注损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

发动机前端附件驱动系统关键零部件的建模与设计分析方法

国家自然科学基金

0+阅读 · 2013年12月31日

磁场对热镀锌过渡层金属间化合物生长机制的影响

国家自然科学基金

0+阅读 · 2013年12月31日

热休克蛋白GRP78途径调节的急性胰腺炎炎症反应的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

航空发动机薄壁叶片激光等离子体冲击超高应变率变形机理

国家自然科学基金

0+阅读 · 2012年12月31日

ABC2型光电功能材料的多层次研究及新材料探索

国家自然科学基金

0+阅读 · 2011年12月31日

典型AB化合物高压相变机理及相变可控性的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

人参皂苷Re通过抑制炎症治疗2 型糖尿病的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

5+阅读 · 2008年12月31日

Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks

Arxiv

0+阅读 · 2022年7月20日

Position Prediction as an Effective Pretraining Strategy

Arxiv

0+阅读 · 2022年7月15日

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

Arxiv

0+阅读 · 2022年7月15日

Rethinking Attention Mechanism in Time Series Classification

Arxiv

0+阅读 · 2022年7月14日

A Survey on Green Deep Learning

Arxiv

10+阅读 · 2021年11月10日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks

Arxiv

0+阅读 · 2022年7月20日

Position Prediction as an Effective Pretraining Strategy

Arxiv

0+阅读 · 2022年7月15日

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

Arxiv

0+阅读 · 2022年7月15日

Rethinking Attention Mechanism in Time Series Classification

Arxiv

0+阅读 · 2022年7月14日

A Survey on Green Deep Learning

Arxiv

10+阅读 · 2021年11月10日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

Tisp40在肾缺血再灌注损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

发动机前端附件驱动系统关键零部件的建模与设计分析方法

国家自然科学基金

0+阅读 · 2013年12月31日

磁场对热镀锌过渡层金属间化合物生长机制的影响

国家自然科学基金

0+阅读 · 2013年12月31日

热休克蛋白GRP78途径调节的急性胰腺炎炎症反应的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

航空发动机薄壁叶片激光等离子体冲击超高应变率变形机理

国家自然科学基金

0+阅读 · 2012年12月31日

ABC2型光电功能材料的多层次研究及新材料探索

国家自然科学基金

0+阅读 · 2011年12月31日

典型AB化合物高压相变机理及相变可控性的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

人参皂苷Re通过抑制炎症治疗2 型糖尿病的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

5+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员