"只要你关注" "只要你需要关注" (Attention Is All You Need) - 专知论文

会员服务 ·

0

BLEU · MoDELS · 注意力机制 · Transformer · Networking ·

2017 年 12 月 6 日

Attention Is All You Need

翻译："只要你关注" "只要你需要关注"

Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin

from arxiv, 15 pages, 5 figures

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

翻译：主导序列转换模型以复杂的重复或进化神经网络为基础,以编码器脱coder-decoder配置为主。最有效果的模型还将编码器和解码器通过关注机制连接起来。我们建议了一个新的简单的网络结构,即完全基于关注机制的变异器,完全避免重现和卷发。在两个机器翻译任务上进行的实验显示,这些模型质量优,同时更加平行,培训时间要少得多。我们的模型在WMT 2014 英文到德文翻译任务上达到了28.4 BLEU,改进了现有的最佳结果,包括由2个BLEU组成的组合。在WMT 2014 英文到法文翻译任务上,我们的模型在八个GPU培训了3.5天之后,确定了新的BLEU最新分数为41.8,这是最佳模型培训成本的一小部分。我们显示,变异器通过成功地将它应用到英语选区,用大而有限的培训数据进行分类,从而对其他任务进行了很好的概括。

27

相关内容

BLEU

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

53+阅读 · 2019年4月12日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

基于attention的seq2seq机器翻译实践详解

基于attention的seq2seq机器翻译实践详解

黑龙江大学自然语言处理实验室

11+阅读 · 2018年3月14日

一文读懂「Attention is All You Need」| 附代码实现

一文读懂「Attention is All You Need」| 附代码实现

PaperWeekly

37+阅读 · 2018年1月10日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Attention is All You Need | 每周一起读

Attention is All You Need | 每周一起读

PaperWeekly

10+阅读 · 2017年6月28日

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

Arxiv

9+阅读 · 2020年3月25日

Talking-Heads Attention

Talking-Heads Attention

Arxiv

15+阅读 · 2020年3月5日

Attention Is (not) All You Need for Commonsense Reasoning

Arxiv

7+阅读 · 2019年5月31日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

You May Not Need Attention

Arxiv

4+阅读 · 2018年10月31日

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Arxiv

4+阅读 · 2018年4月23日

Pay More Attention - Neural Architectures for Question-Answering

Arxiv

5+阅读 · 2018年3月25日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

171+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

53+阅读 · 2019年4月12日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

基于attention的seq2seq机器翻译实践详解

基于attention的seq2seq机器翻译实践详解

黑龙江大学自然语言处理实验室

11+阅读 · 2018年3月14日

一文读懂「Attention is All You Need」| 附代码实现

一文读懂「Attention is All You Need」| 附代码实现

PaperWeekly

37+阅读 · 2018年1月10日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Attention is All You Need | 每周一起读

Attention is All You Need | 每周一起读

PaperWeekly

10+阅读 · 2017年6月28日

相关论文

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

Arxiv

9+阅读 · 2020年3月25日

Talking-Heads Attention

Talking-Heads Attention

Arxiv

15+阅读 · 2020年3月5日

Attention Is (not) All You Need for Commonsense Reasoning

Arxiv

7+阅读 · 2019年5月31日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

You May Not Need Attention

Arxiv

4+阅读 · 2018年10月31日

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Arxiv

4+阅读 · 2018年4月23日

Pay More Attention - Neural Architectures for Question-Answering

Arxiv

5+阅读 · 2018年3月25日

微信扫码咨询专知VIP会员