饱和变异器的动力:电路复杂度的视角 (On the Power of Saturated Transformers: A View from Circuit Complexity) - 专知论文

会员服务 ·

0

硬性注意力 · 饱和 · 注意力机制 · 变换 · 泛化理论 ·

2021 年 8 月 25 日

On the Power of Saturated Transformers: A View from Circuit Complexity

翻译：饱和变异器的动力:电路复杂度的视角

William Merrill,Yoav Goldberg,Noah A. Smith

from arxiv, Preprint. Disclaimer: current version is buggy, and will be fixed (Updated 8/25; content untouched)

Transformers have become a standard architecture for many NLP problems. This has motivated theoretically analyzing their capabilities as models of language, in order to understand what makes them successful, and what their potential weaknesses might be. Recent work has shown that transformers with hard attention are quite limited in capacity, and in fact can be simulated by constant-depth circuits. However, hard attention is a restrictive assumption, which may complicate the relevance of these results for practical transformers. In this work, we analyze the circuit complexity of transformers with saturated attention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We show that saturated transformers transcend the limitations of hard-attention transformers. With some minor assumptions, we prove that the number of bits needed to represent a saturated transformer memory vector is $O(\log n)$, which implies saturated transformers can be simulated by log-depth circuits. Thus, the jump from hard to saturated attention can be understood as increasing the transformer's effective circuit depth by a factor of $O(\log n)$.

翻译：变压器已成为许多NLP问题的标准架构。这促使从理论上分析他们作为语言模型的能力, 以便了解他们的成功之处, 以及他们潜在的弱点。最近的工作显示, 注意力非常集中的变压器在容量上相当有限, 事实上, 可以通过持续深度电路模拟。但是, 硬关注是一种限制性的假设, 这可能使这些结果对实用变压器的相关性复杂化。在这项工作中, 我们分析饱和的变压器的电路复杂性: 一种集中的硬关注, 更密切地捕捉到在实际变压器中可以学习的注意模式。我们显示饱和变压器超越了硬性感变压器的局限性。我们用一些小的假设, 我们证明代表饱和变压器内载器的比特数是$O( nlog) $, 这意味着饱和变压器的变压器可以通过日志深度电路模拟。因此, 从硬度到饱和化的注意的跳动可以被理解为通过 $O( n) 系数增加变压器的有效电路深。

0

相关内容

硬性注意力

硬性注意力

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

专知会员服务

43+阅读 · 2021年10月21日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

专知会员服务

37+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

简单粗暴 TensorFlow Eager 教程

简单粗暴 TensorFlow Eager 教程

专知

5+阅读 · 2018年9月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

On the Global Convergence of Momentum-based Policy Gradient

Arxiv

0+阅读 · 2021年10月19日

LEO Satellites in 5G and Beyond Networks: A Review from a Standardization Perspective

Arxiv

1+阅读 · 2021年10月16日

Complexity of optimizing over the integers

Arxiv

0+阅读 · 2021年10月15日

A Quantum Hopfield Associative Memory Implemented on an Actual Quantum Processor

Arxiv

0+阅读 · 2021年10月15日

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

Arxiv

0+阅读 · 2021年10月15日

Towards a Unified View of Parameter-Efficient Transfer Learning

Arxiv

0+阅读 · 2021年10月8日

Tensor-to-Image: Image-to-Image Translation with Vision Transformers

Arxiv

1+阅读 · 2021年10月6日

The Role of Social Movements, Coalitions, and Workers in Resisting Harmful Artificial Intelligence and Contributing to the Development of Responsible AI

Arxiv

0+阅读 · 2021年7月11日

Equivalent Causal Models

Arxiv

6+阅读 · 2020年12月10日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

VIP会员

文章信息

相关主题

硬性注意力

注意力机制

相关VIP内容

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

【DSAA教程】可解释人工智能金融服务，325页ppt，Explainable AI in Financial Services

专知会员服务

43+阅读 · 2021年10月21日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

【教程推荐】可信任深度学习，44页ppt，PDE Based Trustworthy Deep Learning

专知会员服务

37+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】迈向鲁棒的零样本强化学习

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

【普林斯顿博士论文】量化、评估与缓解现代机器学习系统中的风险

遥感中基于深度学习的领域自适应方法：全面综述

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

简单粗暴 TensorFlow Eager 教程

简单粗暴 TensorFlow Eager 教程

专知

5+阅读 · 2018年9月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

On the Global Convergence of Momentum-based Policy Gradient

Arxiv

0+阅读 · 2021年10月19日

LEO Satellites in 5G and Beyond Networks: A Review from a Standardization Perspective

Arxiv

1+阅读 · 2021年10月16日

Complexity of optimizing over the integers

Arxiv

0+阅读 · 2021年10月15日

A Quantum Hopfield Associative Memory Implemented on an Actual Quantum Processor

Arxiv

0+阅读 · 2021年10月15日

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

Arxiv

0+阅读 · 2021年10月15日

Towards a Unified View of Parameter-Efficient Transfer Learning

Arxiv

0+阅读 · 2021年10月8日

Tensor-to-Image: Image-to-Image Translation with Vision Transformers

Arxiv

1+阅读 · 2021年10月6日

The Role of Social Movements, Coalitions, and Workers in Resisting Harmful Artificial Intelligence and Contributing to the Development of Responsible AI

Arxiv

0+阅读 · 2021年7月11日

Equivalent Causal Models

Arxiv

6+阅读 · 2020年12月10日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

微信扫码咨询专知VIP会员