Skyext:用高森凯尔内尔和Nyström方法改造自我保护模式 (Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method) - 专知论文

会员服务 ·

0

高斯核 · 核化 · 核机器 · 计算成本 · 可约的 ·

2021 年 10 月 29 日

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method

翻译：Skyext:用高森凯尔内尔和Nyström方法改造自我保护模式

Yifan Chen,Qi Zeng,Heng Ji,Yun Yang

from arxiv, To appear in NeurIPS 2021

Transformers are expensive to train due to the quadratic time and space complexity in the self-attention mechanism. On the other hand, although kernel machines suffer from the same computation bottleneck in pairwise dot products, several approximation schemes have been successfully incorporated to considerably reduce their computational cost without sacrificing too much accuracy. In this work, we leverage the computation methods for kernel machines to alleviate the high computational cost and introduce Skyformer, which replaces the softmax structure with a Gaussian kernel to stabilize the model training and adapts the Nystr\"om method to a non-positive semidefinite matrix to accelerate the computation. We further conduct theoretical analysis by showing that the matrix approximation error of our proposed method is small in the spectral norm. Experiments on Long Range Arena benchmark show that the proposed method is sufficient in getting comparable or even better performance than the full self-attention while requiring fewer computation resources.

翻译：另一方面,尽管内核机器在双点产品中也存在同样的计算瓶颈,但一些近似方案已经成功纳入,以大幅降低计算成本,同时又不牺牲过多的精确度。在这项工作中,我们利用内核机器的计算方法来减轻高计算成本,并引入Skyexon,用高斯内核来取代软体结构,以稳定模型培训,并使Nystr\'om方法适应非阳性半半成型矩阵以加速计算。我们进一步进行理论分析,通过显示我们拟议方法的矩阵近似错误在光谱规范中很小。长区域实验显示,拟议的方法足以比全部自留能力更具有可比性或更好的性能,同时需要较少的计算资源。

0

相关内容

高斯核

东京大学 | TrTr：基于Transformer的目标跟踪

专知会员服务

36+阅读 · 2021年5月12日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

近期必读的六篇顶会 ICML 2020【图神经网络 (GNN) 】相关论文

近期必读的六篇顶会 ICML 2020【图神经网络 (GNN) 】相关论文

专知会员服务

143+阅读 · 2020年6月23日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation

Arxiv

5+阅读 · 2021年7月28日

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Arxiv

5+阅读 · 2021年6月7日

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Arxiv

3+阅读 · 2021年6月6日

Scalable Graph Neural Networks via Bidirectional Propagation

Arxiv

16+阅读 · 2020年10月29日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Redundancy-Free Computation Graphs for Graph Neural Networks

Arxiv

3+阅读 · 2019年6月9日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

Gaussian Error Linear Units (GELUs)

Arxiv

3+阅读 · 2018年11月11日

A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models

Arxiv

3+阅读 · 2018年5月20日

VIP会员

文章信息

相关主题

相关VIP内容

东京大学 | TrTr：基于Transformer的目标跟踪

专知会员服务

36+阅读 · 2021年5月12日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

近期必读的六篇顶会 ICML 2020【图神经网络 (GNN) 】相关论文

近期必读的六篇顶会 ICML 2020【图神经网络 (GNN) 】相关论文

专知会员服务

143+阅读 · 2020年6月23日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

万字长文 | 人工智能引发大规模战争的六种可能路径

《美陆军最新条令：陆基中段防御作战》最新88页

《海底电缆维护的未来：趋势、挑战与战略》最新133页报告

《亚音速导弹空对空拦截建模与控制》

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation

Arxiv

5+阅读 · 2021年7月28日

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Arxiv

5+阅读 · 2021年6月7日

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Arxiv

3+阅读 · 2021年6月6日

Scalable Graph Neural Networks via Bidirectional Propagation

Arxiv

16+阅读 · 2020年10月29日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Redundancy-Free Computation Graphs for Graph Neural Networks

Arxiv

3+阅读 · 2019年6月9日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

Gaussian Error Linear Units (GELUs)

Arxiv

3+阅读 · 2018年11月11日

A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models

Arxiv

3+阅读 · 2018年5月20日

微信扫码咨询专知VIP会员