连续变换器:无裁员注意在线推断 (Continual Transformers: Redundancy-Free Attention for Online Inference) - 专知论文

会员服务 ·

0

Continuity · 在线推断 · 推断 · 变换 · Attention ·

2023 年 1 月 24 日

Continual Transformers: Redundancy-Free Attention for Online Inference

翻译：连续变换器:无裁员注意在线推断

Lukas Hedegaard,Arian Bakhtiarnia,Alexandros Iosifidis

from arxiv, 16 pages, 6 figures, 7 tables

Transformers in their common form are inherently limited to operate on whole token sequences rather than on one token at a time. Consequently, their use during online inference on time-series data entails considerable redundancy due to the overlap in successive token sequences. In this work, we propose novel formulations of the Scaled Dot-Product Attention, which enable Transformers to perform efficient online token-by-token inference on a continual input stream. Importantly, our modifications are purely to the order of computations, while the outputs and learned weights are identical to those of the original Transformer Encoder. We validate our Continual Transformer Encoder with experiments on the THUMOS14, TVSeries and GTZAN datasets with remarkable results: Our Continual one- and two-block architectures reduce the floating point operations per prediction by up to 63x and 2.6x, respectively, while retaining predictive performance.

翻译：通用式的变换器在本质上限于按全等序列运行,而不是一次以一个符号运行。因此,在时间序列数据的在线推断中,由于连续的代号序列重叠,在时间序列数据的在线推断中使用这些变换器需要大量冗余。在这项工作中,我们提议了缩放点-Producle 注意的新配方,使变换器能够在连续输入流上高效的在线逐个象征性推论。重要的是,我们的修改完全按照计算顺序进行,而产出和学到的重量与原变换器的相同。我们用THUMOS14、TeVSeries和GTZAN数据集的实验来验证我们的连续变换转换器,结果显著:我们的连续一和两块结构将每个预测的浮点操作分别减少63x和2.6x,同时保留预测性能。

0

相关内容

Continuity

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于深度卷积神经网络的多源遥感图像时空融合方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

氧化石墨烯/多组份导电聚合物超分子复合体系气敏特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于动边界LES的离心泵非稳定流固耦合及振动特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

雄激素受体在膀胱癌进展中对GATA3的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HIF-1α/LASP-1通路在胰腺癌转移中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Sorcin与Stathmin相互作用及其在胃癌多药耐药机制中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于活动的超网络实时交通信息发布策略及有效性研究

国家自然科学基金

0+阅读 · 2012年12月31日

深井、超深井钻柱的非线性动力学特性分析

国家自然科学基金

0+阅读 · 2011年12月31日

CX3CL1/CX3CR1相互作用调控低氧前列腺癌细胞转移的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

肾小管细胞膜磷脂酰丝氨酸(PS)外翻对草酸钙结石粘附性的影响及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

Robust online active learning

Arxiv

0+阅读 · 2023年3月15日

GaPT: Gaussian Process Toolkit for Online Regression with Application to Learning Quadrotor Dynamics

Arxiv

0+阅读 · 2023年3月14日

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Arxiv

0+阅读 · 2023年3月14日

Loss of Plasticity in Continual Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月13日

Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing

Arxiv

0+阅读 · 2023年3月13日

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

Arxiv

0+阅读 · 2023年3月13日

Exphormer: Sparse Transformers for Graphs

Arxiv

0+阅读 · 2023年3月10日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Robust online active learning

Arxiv

0+阅读 · 2023年3月15日

GaPT: Gaussian Process Toolkit for Online Regression with Application to Learning Quadrotor Dynamics

Arxiv

0+阅读 · 2023年3月14日

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Arxiv

0+阅读 · 2023年3月14日

Loss of Plasticity in Continual Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月13日

Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing

Arxiv

0+阅读 · 2023年3月13日

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

Arxiv

0+阅读 · 2023年3月13日

Exphormer: Sparse Transformers for Graphs

Arxiv

0+阅读 · 2023年3月10日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

基于深度卷积神经网络的多源遥感图像时空融合方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

氧化石墨烯/多组份导电聚合物超分子复合体系气敏特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于动边界LES的离心泵非稳定流固耦合及振动特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

雄激素受体在膀胱癌进展中对GATA3的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HIF-1α/LASP-1通路在胰腺癌转移中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Sorcin与Stathmin相互作用及其在胃癌多药耐药机制中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于活动的超网络实时交通信息发布策略及有效性研究

国家自然科学基金

0+阅读 · 2012年12月31日

深井、超深井钻柱的非线性动力学特性分析

国家自然科学基金

0+阅读 · 2011年12月31日

CX3CL1/CX3CR1相互作用调控低氧前列腺癌细胞转移的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

肾小管细胞膜磷脂酰丝氨酸(PS)外翻对草酸钙结石粘附性的影响及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员