树前:用于高效注意计数的浓度梯度树 (Treeformer: Dense Gradient Trees for Efficient Attention Computation) - 专知论文

会员服务 ·

0

Attention · 层 · Extensibility · 可约的 · 变换 ·

2022 年 8 月 18 日

Treeformer: Dense Gradient Trees for Efficient Attention Computation

翻译：树前:用于高效注意计数的浓度梯度树

Lovish Madaan,Srinadh Bhojanapalli,Himanshu Jain,Prateek Jain

from arxiv, Preprint. Under Review

Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc. Consequently, several approaches have been developed recently to speedup attention computation by enforcing different attention structures such as sparsity, low-rank, approximating attention using kernels. In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. Based on such hierarchical navigation, we design Treeformer which can use one of two efficient attention layers -- TF-Attention and TC-Attention. TF-Attention computes the attention in a fine-grained style, while TC-Attention is a coarse attention layer which also ensures that the gradients are "dense". To optimize such challenging discrete layers, we propose a two-level bootstrapped training method. Using extensive experiments on standard NLP benchmarks, especially for long-sequences, we demonstrate that our Treeformer architecture can be almost as accurate as baseline Transformer while using 30x lesser FLOPs in the attention layer. Compared to Linformer, the accuracy can be as much as 12% higher while using similar FLOPs in the attention layer.

翻译：通过输入序列长度的变压器结构,标准发酵和训练,以输入序列长度为基调。对于各种应用,特别是网页翻译、问答等应用来说,这非常之大。因此,最近开发了几种方法,通过使用内核强制实施不同关注结构,例如聚度、低级别、近距离关注等,加快关注的计算。在这项工作中,我们将关注的计算视为最近的邻居检索,并使用基于决策树的等级导航,将每个查询标记的检索成本从线性长度线性到近对数性。根据这种等级导航,我们设计了树形前,可以使用两个高效关注层 -- -- TF- 注意和TC- 注意层等两个高效关注层之一 -- -- TF- 注意和 T- 注意- 计算。TF- 注意以细微的调调色调方式使关注得到关注,而TC- 注意是一个粗微的注意层,也确保梯度是“强烈的”。为了优化这种具有挑战性的离层,我们建议一种双层的培训方法。在标准 NLP 基准基准上进行广泛的实验,特别是对于长期的精度结构的精确度结构,同时,我们可以使用更精确的L- 。

0

相关内容

Attention

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

5d金属Ir及其合金纳米颗粒、纳米线的电、磁性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

superstrate结构铜锌硒硫太阳电池制备中的关键科学问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

miR-499在急性心肌梗死早期辅助诊断及治疗中的作用及其分子机制探讨

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

多畴铁电薄膜畴结构与铁电性能的相场法研究

国家自然科学基金

0+阅读 · 2011年12月31日

反胶束体系中纳米结构空心金属@SiO2形成机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

HSP70调节线粒体基因转录和增强功能参与病毒感染神经元保护作用的新机制

国家自然科学基金

0+阅读 · 2009年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

过热金属熔体行为与非晶形成能力

国家自然科学基金

0+阅读 · 2008年12月31日

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Simple Pooling Front-ends For Efficient Audio Classification

Arxiv

0+阅读 · 2022年10月3日

Efficient computation of the Knowledge Gradient for Bayesian Optimization

Arxiv

0+阅读 · 2022年9月30日

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Arxiv

1+阅读 · 2022年9月30日

An efficient encoder-decoder architecture with top-down attention for speech separation

Arxiv

0+阅读 · 2022年9月30日

Efficient Transformers: A Survey

Arxiv

35+阅读 · 2022年3月14日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Temporally Consistent Video Transformer for Long-Term Video Prediction

Arxiv

0+阅读 · 2022年10月5日

Simple Pooling Front-ends For Efficient Audio Classification

Arxiv

0+阅读 · 2022年10月3日

Efficient computation of the Knowledge Gradient for Bayesian Optimization

Arxiv

0+阅读 · 2022年9月30日

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Arxiv

1+阅读 · 2022年9月30日

An efficient encoder-decoder architecture with top-down attention for speech separation

Arxiv

0+阅读 · 2022年9月30日

Efficient Transformers: A Survey

Arxiv

35+阅读 · 2022年3月14日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

相关基金

5d金属Ir及其合金纳米颗粒、纳米线的电、磁性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

superstrate结构铜锌硒硫太阳电池制备中的关键科学问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

miR-499在急性心肌梗死早期辅助诊断及治疗中的作用及其分子机制探讨

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

多畴铁电薄膜畴结构与铁电性能的相场法研究

国家自然科学基金

0+阅读 · 2011年12月31日

反胶束体系中纳米结构空心金属@SiO2形成机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

HSP70调节线粒体基因转录和增强功能参与病毒感染神经元保护作用的新机制

国家自然科学基金

0+阅读 · 2009年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

过热金属熔体行为与非晶形成能力

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员