注意力比母体分解更好吗? (Is Attention Better Than Matrix Decomposition?) - 专知论文

会员服务 ·

0

Better · Performer · 注意力机制 · 优化器 · MoDELS ·

2021 年 9 月 9 日

Is Attention Better Than Matrix Decomposition?

翻译：注意力比母体分解更好吗?

Zhengyang Geng,Meng-Hao Guo,Hongxu Chen,Xia Li,Ke Wei,Zhouchen Lin

from arxiv, ICLR 2021

As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.

翻译：作为现代深层学习的基本要素,关注机制,特别是自我关注机制,在全球相关发现中发挥着关键作用。然而,在模拟全球背景时,手工制造的注意力是不可替代的。我们有趣的发现是,自我关注并不比20年前开发的矩阵分解模型(MD)模型更好,该模型涉及长距离依赖性编码的性能和计算成本。我们将全球背景问题作为低级恢复问题模型,并表明其优化算法有助于设计全球信息区块。本文随后提出了一系列汉堡人,其中我们使用优化算法将投入表达纳入次矩阵,并重建低级嵌入器。不同MD的汉堡人在认真应对通过MD反向适应的梯度时,可以优于广受欢迎的全球背景模块自我关注。在愿景任务中,我们进行了全面实验,了解全球背景至关重要,包括语系分割和图像生成,展示了自我保护及其变体的重大改进。

0

相关内容

Better

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

已删除

将门创投

7+阅读 · 2019年3月28日

On Improving Adversarial Transferability of Vision Transformers

Arxiv

0+阅读 · 2021年11月1日

Two Heads are Better than One: Geometric-Latent Attention for Point Cloud Classification and Segmentation

Arxiv

0+阅读 · 2021年10月30日

Pay Attention to MLPs

Arxiv

28+阅读 · 2021年5月17日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

Arxiv

5+阅读 · 2020年10月2日

Rethinking Attention with Performers

Arxiv

3+阅读 · 2020年9月30日

Deep Short Text Classification with Knowledge Powered Attention

Arxiv

8+阅读 · 2019年2月21日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

已删除

将门创投

7+阅读 · 2019年3月28日

相关论文

On Improving Adversarial Transferability of Vision Transformers

Arxiv

0+阅读 · 2021年11月1日

Two Heads are Better than One: Geometric-Latent Attention for Point Cloud Classification and Segmentation

Arxiv

0+阅读 · 2021年10月30日

Pay Attention to MLPs

Arxiv

28+阅读 · 2021年5月17日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

Arxiv

5+阅读 · 2020年10月2日

Rethinking Attention with Performers

Arxiv

3+阅读 · 2020年9月30日

Deep Short Text Classification with Knowledge Powered Attention

Arxiv

8+阅读 · 2019年2月21日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员