近似关注 (Value-aware Approximate Attention) - 专知论文

会员服务 ·

0

近似 · 注意力机制 · 核函数 · 优化器 · 核化 ·

2021 年 3 月 17 日

Value-aware Approximate Attention

翻译：近似关注

Ankit Gupta,Jonathan Berant

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all approximations thus far have ignored the contribution of the $\textit{value vectors}$ to the quality of approximation. In this work, we argue that research efforts should be directed towards approximating the true output of the attention sub-layer, which includes the value vectors. We propose a value-aware objective, and show theoretically and empirically that an optimal approximation of a value-aware objective substantially outperforms an optimal approximation that ignores values, in the context of language modeling. Moreover, we show that the choice of kernel function for computing attention similarity can substantially affect the quality of sparse approximations, where kernel functions that are less skewed are more affected by the value vectors.

翻译：由于在变形器中对点产品的关注取得了成功,最近提出了许多近似值,以解决其投入长度的二次复杂程度。然而,到目前为止,所有近似值都忽略了$\ textit{ value 矢量} 美元对近似质量的贡献。在这项工作中,我们主张,研究工作的方向应该是接近关注子层的真正产出,包括值矢量。我们提出了一个有价值认知的目标,并从理论上和经验上表明,一个有价值目标的最佳近似值大大超过一个在语言模型中忽略了值的最佳近似值。此外,我们表明,为计算关注度而选择内核函数会大大影响微弱的近似值的质量,因为低偏差的内核函数会受到价值矢量的影响更大。

1

相关内容

【万字长文】注意力机制可解释大论述

专知会员服务

55+阅读 · 2020年11月17日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

注意力机制（Attention Mechanism）在自然语言处理中的应用

注意力机制（Attention Mechanism）在自然语言处理中的应用

全球人工智能

6+阅读 · 2018年3月28日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

已删除

将门创投

4+阅读 · 2017年11月1日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

自然语言处理中的Attention Model：是什么及为什么

自然语言处理中的Attention Model：是什么及为什么

新智元

11+阅读 · 2017年7月13日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

Bias in Zipf's Law Estimators

Bias in Zipf's Law Estimators

Arxiv

0+阅读 · 2021年5月12日

Optimal pointwise sampling for $L^2$ approximation

Arxiv

0+阅读 · 2021年5月12日

Approximate and discrete Euclidean vector bundles

Arxiv

0+阅读 · 2021年5月11日

Approximating Multistage Matching Problems

Arxiv

0+阅读 · 2021年5月10日

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Arxiv

0+阅读 · 2021年5月8日

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

Arxiv

0+阅读 · 2021年5月6日

GraphGuess: Approximate Graph Processing System with Adaptive Correction

Arxiv

0+阅读 · 2021年4月11日

Manifold learning with approximate nearest neighbors

Arxiv

0+阅读 · 2021年2月22日

Approximation Ratios of Graph Neural Networks for Combinatorial Problems

Arxiv

7+阅读 · 2019年5月24日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【万字长文】注意力机制可解释大论述

专知会员服务

55+阅读 · 2020年11月17日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

注意力机制（Attention Mechanism）在自然语言处理中的应用

注意力机制（Attention Mechanism）在自然语言处理中的应用

全球人工智能

6+阅读 · 2018年3月28日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

已删除

将门创投

4+阅读 · 2017年11月1日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

自然语言处理中的Attention Model：是什么及为什么

自然语言处理中的Attention Model：是什么及为什么

新智元

11+阅读 · 2017年7月13日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

Bias in Zipf's Law Estimators

Bias in Zipf's Law Estimators

Arxiv

0+阅读 · 2021年5月12日

Optimal pointwise sampling for $L^2$ approximation

Arxiv

0+阅读 · 2021年5月12日

Approximate and discrete Euclidean vector bundles

Arxiv

0+阅读 · 2021年5月11日

Approximating Multistage Matching Problems

Arxiv

0+阅读 · 2021年5月10日

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Arxiv

0+阅读 · 2021年5月8日

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

Arxiv

0+阅读 · 2021年5月6日

GraphGuess: Approximate Graph Processing System with Adaptive Correction

Arxiv

0+阅读 · 2021年4月11日

Manifold learning with approximate nearest neighbors

Arxiv

0+阅读 · 2021年2月22日

Approximation Ratios of Graph Neural Networks for Combinatorial Problems

Arxiv

7+阅读 · 2019年5月24日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

微信扫码咨询专知VIP会员