关注实际涉及多少? 质疑未受训练的变形人关注的重要性 (How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers) - 专知论文

会员服务 ·

0

Attention · 变换 · MoDELS · Performer · motivation ·

2022 年 11 月 7 日

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

翻译：关注实际涉及多少? 质疑未受训练的变形人关注的重要性

Michael Hassid,Hao Peng,Daniel Rotem,Jungo Kasai,Ivan Montero,Noah A. Smith,Roy Schwartz

from arxiv, Findings of EMNLP 2022

The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones -- the average attention weights over multiple inputs. We use PAPA to analyze several established pretrained Transformers on six downstream tasks. We find that without any input-dependent attention, all models achieve competitive performance -- an average relative drop of only 8% from the probing baseline. Further, little or no performance drop is observed when replacing half of the input-dependent attention matrices with constant (input-independent) ones. Interestingly, we show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success. Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.

翻译：关注机制被视为广泛使用的变异器结构的支柱。它通过计算特定投入的注意矩阵来将输入内容背景化。我们发现这个机制虽然强大而优雅,但并不象对预先培训的语言模型通常认为的那样重要。我们引入了一种新的探索方法PAPA, 将依赖投入的注意矩阵替换为恒定的矩阵 -- -- 即对多个投入的平均关注权。我们使用PAPA来分析六个下游任务中几个已经成熟的预先培训的变异器。我们发现,在没有任何依赖投入的注意权的情况下,所有模型都取得了竞争性的性能 -- -- 从预测基线中平均只有8%的相对下降。此外,在用固定(独立投入的)模式取代一半依赖投入的注意矩阵时,很少或没有观察到性能下降。有趣的是,我们显示,业绩更好的模型由于采用我们的方法而不是较弱的模型而损失更多,这表明,使用依赖投入的注意机制可能是成功的一个因素。我们的结果鼓励研究如何更简单地替代依赖投入的注意,以及在变异器结构中更好地利用这一机制的方法。

0

相关内容

Attention

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

修饰化量子点CdTe QDs对人体细胞的毒性评价研究

国家自然科学基金

0+阅读 · 2014年12月31日

插层与Te元素掺杂对FeSe超导体系磁通钉扎机制的影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

强力学仿生细胞外基质纳米纤维支架介导DCN shRNA长效转染ASCs的肌腱缺损修复研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cu基类金刚石结构新型热电化合物的设计与优化

国家自然科学基金

0+阅读 · 2012年12月31日

水溶性REF3 (RE = Y,Gd)-KF体系上转换发光纳米材料合成及其生物应用

国家自然科学基金

0+阅读 · 2012年12月31日

流体动力学领域中若干具有奇异性的数学模型

国家自然科学基金

0+阅读 · 2012年12月31日

线粒体tRNA前体加工与细胞周期调控之间偶联机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

碲化镉量子点细胞毒性的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

分形集上Diophantine逼近的若干问题研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

Arxiv

0+阅读 · 2022年12月30日

Additive Polynomial Time Integrators, Part I: Framework and Fully-Implicit-Explicit (FIMEX) Collocation Methods

Arxiv

0+阅读 · 2022年12月30日

Efficient comparison of independence structures of log-linear models

Arxiv

0+阅读 · 2022年12月27日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

HONE: Higher-Order Network Embeddings

Arxiv

12+阅读 · 2018年1月28日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

Arxiv

0+阅读 · 2022年12月30日

Additive Polynomial Time Integrators, Part I: Framework and Fully-Implicit-Explicit (FIMEX) Collocation Methods

Arxiv

0+阅读 · 2022年12月30日

Efficient comparison of independence structures of log-linear models

Arxiv

0+阅读 · 2022年12月27日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

HONE: Higher-Order Network Embeddings

Arxiv

12+阅读 · 2018年1月28日

相关基金

修饰化量子点CdTe QDs对人体细胞的毒性评价研究

国家自然科学基金

0+阅读 · 2014年12月31日

插层与Te元素掺杂对FeSe超导体系磁通钉扎机制的影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

强力学仿生细胞外基质纳米纤维支架介导DCN shRNA长效转染ASCs的肌腱缺损修复研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cu基类金刚石结构新型热电化合物的设计与优化

国家自然科学基金

0+阅读 · 2012年12月31日

水溶性REF3 (RE = Y,Gd)-KF体系上转换发光纳米材料合成及其生物应用

国家自然科学基金

0+阅读 · 2012年12月31日

流体动力学领域中若干具有奇异性的数学模型

国家自然科学基金

0+阅读 · 2012年12月31日

线粒体tRNA前体加工与细胞周期调控之间偶联机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

碲化镉量子点细胞毒性的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

分形集上Diophantine逼近的若干问题研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员