COsFormer: 注意时重新思考软性 (cosFormer: Rethinking Softmax in Attention) - 专知论文

会员服务 ·

0

cosFormer · Softmax · Performer · 注意力机制 · 线性的 ·

2022 年 2 月 17 日

cosFormer: Rethinking Softmax in Attention

翻译：COsFormer: 注意时重新思考软性

Zhen Qin,Weixuan Sun,Hui Deng,Dongxu Li,Yunshen Wei,Baohong Lv,Junjie Yan,Lingpeng Kong,Yiran Zhong

from arxiv, Accepted to ICLR2022. Yiran Zhong is the corresponding author. Zhen Qin, Weixuan Sun, Hui Deng contributed equally to this work

Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer.

翻译：变压器在自然语言处理、计算机视觉和音频处理方面取得了巨大成功。软性关注是其核心组成部分之一, 有助于捕捉长距离依赖性, 但由于音序长度的四边空间和时间复杂性, 也禁止其扩大。内核方法通常被采用, 通过接近软体操作器来降低复杂性。然而, 由于近似错误, 其性能因任务/ 体体而异, 与香草软体关注度相比, 其性能会发生重大下降。在本文中, 我们提议使用一个叫做cos Former 的线性变压器, 它可以在临时和交叉关注中实现与香草变压器的可比或更准确性。 COs Former基于软体注意的两种关键特性: i. 注意矩阵的无偏重度矩阵;ii. 一种非线性重力重力重力计划, 可以集中分配注意矩阵。作为线性替代品, cosFormer将这些特性与线性操作器的远程再加权调整机制。在长期性变压法中进行广泛的实验, 我们的模型/ 的 Rormainal rodeal rode roal rode rodu rodustral la la la la la la la la la la la lax la la lax lax lax lax lax lax lax lax

0

相关内容

cosFormer

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ICLR 2022 | 在注意力中重新思考Softmax，商汤提出cosFormer实现多项SOTA

ICLR 2022 | 在注意力中重新思考Softmax，商汤提出cosFormer实现多项SOTA

PaperWeekly

0+阅读 · 2022年3月13日

ICLR 2022 | 商汤提出cosFormer：在注意力中重新思考Softmax

ICLR 2022 | 商汤提出cosFormer：在注意力中重新思考Softmax

CVer

0+阅读 · 2022年2月21日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

基于分层与或图模型的光学遥感图像场景理解方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

改进智能优化策略多机动目标跟踪方法研究

国家自然科学基金

17+阅读 · 2015年12月31日

基于参数化降阶模型和前视突风探测信息的飞机载荷减缓控制

国家自然科学基金

0+阅读 · 2014年12月31日

褐家鼠群体MHC遗传多态性与SEOV感染相关性的研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性机制对ENSO循环的影响

国家自然科学基金

0+阅读 · 2012年12月31日

QBO影响和调制东亚冬季风的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

面向认知物联网的自主认知与智慧决策机制研究

国家自然科学基金

4+阅读 · 2012年12月31日

基于体验卷入度的数字信息商品购买决策机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

异步光分组交换分布式控制节点关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Arxiv

0+阅读 · 2022年4月19日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

Arxiv

0+阅读 · 2022年4月18日

Accelerating Attention through Gradient-Based Learned Runtime Pruning

Arxiv

1+阅读 · 2022年4月15日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

Additive Margin Softmax for Face Verification

Arxiv

11+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

ICLR 2022 | 在注意力中重新思考Softmax，商汤提出cosFormer实现多项SOTA

ICLR 2022 | 在注意力中重新思考Softmax，商汤提出cosFormer实现多项SOTA

PaperWeekly

0+阅读 · 2022年3月13日

ICLR 2022 | 商汤提出cosFormer：在注意力中重新思考Softmax

ICLR 2022 | 商汤提出cosFormer：在注意力中重新思考Softmax

CVer

0+阅读 · 2022年2月21日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

相关论文

Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Arxiv

0+阅读 · 2022年4月19日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

Arxiv

0+阅读 · 2022年4月18日

Accelerating Attention through Gradient-Based Learned Runtime Pruning

Arxiv

1+阅读 · 2022年4月15日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

Additive Margin Softmax for Face Verification

Arxiv

11+阅读 · 2018年1月18日

相关基金

基于分层与或图模型的光学遥感图像场景理解方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

改进智能优化策略多机动目标跟踪方法研究

国家自然科学基金

17+阅读 · 2015年12月31日

基于参数化降阶模型和前视突风探测信息的飞机载荷减缓控制

国家自然科学基金

0+阅读 · 2014年12月31日

褐家鼠群体MHC遗传多态性与SEOV感染相关性的研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性机制对ENSO循环的影响

国家自然科学基金

0+阅读 · 2012年12月31日

QBO影响和调制东亚冬季风的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

面向认知物联网的自主认知与智慧决策机制研究

国家自然科学基金

4+阅读 · 2012年12月31日

基于体验卷入度的数字信息商品购买决策机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

异步光分组交换分布式控制节点关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员