饱和变换器是常数- 坚挺阈值电路 (Saturated Transformers are Constant-Depth Threshold Circuits) - 专知论文

会员服务 ·

0

硬性注意力 · 变换 · 饱和 · 注意力机制 · 阈值 ·

2022 年 4 月 11 日

Saturated Transformers are Constant-Depth Threshold Circuits

翻译：饱和变换器是常数- 坚挺阈值电路

William Merrill,Ashish Sabharwal,Noah A. Smith

from arxiv, To appear in TACL

Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al. 2021). However, hard attention is a strong assumption, which may complicate the relevance of these results in practice. In this work, we analyze the circuit complexity of transformers with saturated attention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We first show that saturated transformers transcend the known limitations of hard-attention transformers. We then prove saturated transformers with floating-point values can be simulated by constant-depth threshold circuits, giving the class $\mathsf{TC}^0$ as an upper bound on the formal languages they recognize.

翻译：最新工作显示,关注力强的变压器在力量上相当有限(Hahn,2020年),因为可以通过恒定深度和/或电路模拟变压器(Hao等人,2021年)。然而,关注力是一个强烈的假设,这可能会使这些结果的实际相关性复杂化。在这项工作中,我们分析了变压器的电路复杂性,并给予饱和的注意:集中关注力,更密切地捕捉到在实用变压器中可以学习的注意模式。我们首先显示饱和变压器超越了已知的硬意识变压器的局限性。然后我们证明,具有浮动点值的饱和变压器可以通过恒定深度电路模拟,将美元和mathsf{TC ⁇ 0的等级作为它们所认识的正式语言的上限。

0

相关内容

硬性注意力

硬性注意力

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

26+阅读 · 2022年2月20日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

6+阅读 · 2021年11月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

131+阅读 · 2021年6月16日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

22+阅读 · 2020年11月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Dock3/Paks对癫痫突触可塑性的调控及异常神经网络形成机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

PV中间神经元介导的γ振荡神经微环路在氯胺酮抗抑郁中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

AuNPs-PDMS复合微薄膜电导式表面应力生物传感器制备及力电传感机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

多参数传热反问题的RBF-MLPG方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于混合Petri网的电力CPS协同建模与分析

国家自然科学基金

1+阅读 · 2013年12月31日

面向智能电网基础设施Cyber-Physical安全的自治愈基础理论研究

国家自然科学基金

1+阅读 · 2013年12月31日

岩体灌浆水泥浆液扩散模型与分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于表面结构与性能的伤口愈合与瘢痕修复及建模研究

国家自然科学基金

0+阅读 · 2012年12月31日

大肠杆菌对胸腺素α21407;N-末端乙酰化修饰的机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

Sheaf semantics of termination-insensitive noninterference

Arxiv

0+阅读 · 2022年4月20日

Boundary integral equation methods for the solution of scattering and transmission 2D elastodynamic problems

Arxiv

0+阅读 · 2022年4月20日

Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs

Arxiv

0+阅读 · 2022年4月19日

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Arxiv

0+阅读 · 2022年4月18日

A faster reduction of the dynamic time warping distance to the longest increasing subsequence length

Arxiv

0+阅读 · 2022年4月18日

Adomian Decomposition Based Numerical Scheme for Flow Simulations

Arxiv

0+阅读 · 2022年4月17日

Barwise Compression Schemes for Audio-Based Music Structure Analysis

Barwise Compression Schemes for Audio-Based Music Structure Analysis

Arxiv

0+阅读 · 2022年4月15日

Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2022年4月14日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

VIP会员

文章信息

相关主题

硬性注意力

注意力机制

相关VIP内容

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

26+阅读 · 2022年2月20日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

6+阅读 · 2021年11月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

131+阅读 · 2021年6月16日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

22+阅读 · 2020年11月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

热门VIP内容

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

Sheaf semantics of termination-insensitive noninterference

Arxiv

0+阅读 · 2022年4月20日

Boundary integral equation methods for the solution of scattering and transmission 2D elastodynamic problems

Arxiv

0+阅读 · 2022年4月20日

Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs

Arxiv

0+阅读 · 2022年4月19日

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Arxiv

0+阅读 · 2022年4月18日

A faster reduction of the dynamic time warping distance to the longest increasing subsequence length

Arxiv

0+阅读 · 2022年4月18日

Adomian Decomposition Based Numerical Scheme for Flow Simulations

Arxiv

0+阅读 · 2022年4月17日

Barwise Compression Schemes for Audio-Based Music Structure Analysis

Barwise Compression Schemes for Audio-Based Music Structure Analysis

Arxiv

0+阅读 · 2022年4月15日

Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2022年4月14日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

相关基金

Dock3/Paks对癫痫突触可塑性的调控及异常神经网络形成机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

PV中间神经元介导的γ振荡神经微环路在氯胺酮抗抑郁中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

AuNPs-PDMS复合微薄膜电导式表面应力生物传感器制备及力电传感机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

多参数传热反问题的RBF-MLPG方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于混合Petri网的电力CPS协同建模与分析

国家自然科学基金

1+阅读 · 2013年12月31日

面向智能电网基础设施Cyber-Physical安全的自治愈基础理论研究

国家自然科学基金

1+阅读 · 2013年12月31日

岩体灌浆水泥浆液扩散模型与分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于表面结构与性能的伤口愈合与瘢痕修复及建模研究

国家自然科学基金

0+阅读 · 2012年12月31日

大肠杆菌对胸腺素α21407;N-末端乙酰化修饰的机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员