流动源:具有保护性流动的线形变形器 (Flowformer: Linearizing Transformers with Conservation Flows) - 专知论文

会员服务 ·

0

线性变换 · 注意力机制 · 线性的 · 归纳偏好 · INFORMS ·

2022 年 2 月 13 日

Flowformer: Linearizing Transformers with Conservation Flows

翻译：流动源:具有保护性流动的线形变形器

Haixu Wu,Jialong Wu,Jiehui Xu,Jianmin Wang,Mingsheng Long

Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation with attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning.

翻译：基于关注机制的变异器在不同领域取得了令人瞩目的成功,然而,注意机制却具有四面形的复杂性,严重妨碍了变异器处理许多物证,并推广到更大的模型。以前的方法主要利用矩阵倍增的相似分解和关联性来设计线性注意机制。它们避免通过重新引入诸如地点等诱导偏差而转移对微小分布的注意力,从而牺牲了模式的泛泛性和表达性。在本文件中,我们使变异器摆脱基于流动网络理论的特定诱导偏差而线化。我们关注的是,信息从源(价值)到汇(结果)通过学习的流能力(注意)汇集到汇(结果),在此框架内,我们运用流量保护特性来设计线性关注机制,并提出线性复杂性的流动-注意机制。通过分别保护源竞争和汇分配源源流流流流动流动,在不使用具体的导偏差的情况下,必然产生信息性关注。我们通过流动、流动-变向-变换语言在长线性时间、长线性学习领域产生强有力的业绩,包括长线性学习。

1

相关内容

线性变换

《5G+智慧农业解决方案》22页PPT，三昇农业

《5G+智慧农业解决方案》22页PPT，三昇农业

专知会员服务

55+阅读 · 2022年3月23日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

基于光学定位与协同扫描的水下激光ATP关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

量化多样性对木质残体分解的影响

国家自然科学基金

0+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

相依样本下的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

智能电网精确时间同步理论与技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向不平衡样本的流形学习故障诊断方法

国家自然科学基金

0+阅读 · 2012年12月31日

苯并(a)芘暴露产活性氧介导海洋鱼类抗菌肽的表达机制及其在核转录因子信号通路中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

经济变量因果性的多时间尺度效应及因果推断

国家自然科学基金

2+阅读 · 2011年12月31日

Overlay结构特性对网络攻击的影响的仿真分析

国家自然科学基金

0+阅读 · 2010年12月31日

A posteriori error estimates for hierarchical mixed-dimensional elliptic equations

Arxiv

0+阅读 · 2022年4月19日

Fourier Image Transformer

Arxiv

2+阅读 · 2022年4月19日

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Arxiv

0+阅读 · 2022年4月19日

Utilizing Time-Reversibility for Shock Capturing in Nonlinear Hyperbolic Conservation Laws

Arxiv

0+阅读 · 2022年4月18日

Experimental twin-field quantum key distribution with flawed and correlated sources

Experimental twin-field quantum key distribution with flawed and correlated sources

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Dynamic Position Encoding for Transformers

Arxiv

1+阅读 · 2022年4月18日

ResT V2: Simpler, Faster and Stronger

ResT V2: Simpler, Faster and Stronger

Arxiv

0+阅读 · 2022年4月15日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

《5G+智慧农业解决方案》22页PPT，三昇农业

《5G+智慧农业解决方案》22页PPT，三昇农业

专知会员服务

55+阅读 · 2022年3月23日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML2025】使用树搜索重新排序推理上下文，使大型视觉语言模型更强大

现代人工智能辅助药物发现中的图神经网络

端到端语音到语音翻译的优化方法综述

赋能大型语言模型多领域资源挑战

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

A posteriori error estimates for hierarchical mixed-dimensional elliptic equations

Arxiv

0+阅读 · 2022年4月19日

Fourier Image Transformer

Arxiv

2+阅读 · 2022年4月19日

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Arxiv

0+阅读 · 2022年4月19日

Utilizing Time-Reversibility for Shock Capturing in Nonlinear Hyperbolic Conservation Laws

Arxiv

0+阅读 · 2022年4月18日

Experimental twin-field quantum key distribution with flawed and correlated sources

Experimental twin-field quantum key distribution with flawed and correlated sources

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Dynamic Position Encoding for Transformers

Arxiv

1+阅读 · 2022年4月18日

ResT V2: Simpler, Faster and Stronger

ResT V2: Simpler, Faster and Stronger

Arxiv

0+阅读 · 2022年4月15日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

相关基金

基于光学定位与协同扫描的水下激光ATP关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

量化多样性对木质残体分解的影响

国家自然科学基金

0+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

相依样本下的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

智能电网精确时间同步理论与技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向不平衡样本的流形学习故障诊断方法

国家自然科学基金

0+阅读 · 2012年12月31日

苯并(a)芘暴露产活性氧介导海洋鱼类抗菌肽的表达机制及其在核转录因子信号通路中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

经济变量因果性的多时间尺度效应及因果推断

国家自然科学基金

2+阅读 · 2011年12月31日

Overlay结构特性对网络攻击的影响的仿真分析

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员