TeaForN:教师用N克强迫 (TeaForN: Teacher-Forcing with N-grams) - 专知论文

会员服务 ·

0

N元 · 解码 · 曝光偏差 · Machine Translation · MoDELS ·

2020 年 10 月 9 日

TeaForN: Teacher-Forcing with N-grams

翻译：TeaForN:教师用N克强迫

Sebastian Goodman,Nan Ding,Radu Soricut

from arxiv, to be published in EMNLP 2020

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.

翻译：接受过教师强制培训的序列生成模型存在与暴露偏向和不同时间步之间缺乏差异有关的问题。我们提出的方法,即教师用Ngram(TeaFORN),直接解决了这两个问题,方法是使用一批经过训练的N解码器,按照二级时间轴进行解码,允许根据N预测步骤进行模型参数更新。TeaForN可以使用大量的解码器结构,并需要从标准的教师强制设置中进行最低限度的修改。我们经常显示,TeaForN在一个机器翻译基准(WMT 2014 英文-法文)和两个新闻总结基准(CNN/Dailymail和Gigaword)上提升了生产质量。

0

相关内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【EMNLP2020】序列知识蒸馏进展，44页ppt

【EMNLP2020】序列知识蒸馏进展，44页ppt

专知会员服务

39+阅读 · 2020年11月21日

最新《时序分类:深度序列模型》教程，172页ppt

最新《时序分类:深度序列模型》教程，172页ppt

专知会员服务

43+阅读 · 2020年11月11日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Google BERT作者Jacob斯坦福亲授《上下文词向量与预训练语言模型: BERT到T5》43页ppt

Google BERT作者Jacob斯坦福亲授《上下文词向量与预训练语言模型: BERT到T5》43页ppt

专知会员服务

91+阅读 · 2020年4月6日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

38+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

word2Vec总结

AINLP

3+阅读 · 2019年11月2日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP预训练模型大集合！

NLP预训练模型大集合！

黑龙江大学自然语言处理实验室

6+阅读 · 2018年12月31日

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

专知

3+阅读 · 2018年12月30日

NLP预训练模型大集合

NLP预训练模型大集合

机器学习算法与Python学习

8+阅读 · 2018年12月28日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions

Arxiv

0+阅读 · 2020年11月23日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Arxiv

3+阅读 · 2019年2月28日

Deep Graph Convolutional Encoders for Structured Data to Text Generation

Arxiv

6+阅读 · 2018年10月23日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Unsupervised Neural Machine Translation with Weight Sharing

Arxiv

6+阅读 · 2018年4月24日

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

Arxiv

3+阅读 · 2018年4月8日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【EMNLP2020】序列知识蒸馏进展，44页ppt

【EMNLP2020】序列知识蒸馏进展，44页ppt

专知会员服务

39+阅读 · 2020年11月21日

最新《时序分类:深度序列模型》教程，172页ppt

最新《时序分类:深度序列模型》教程，172页ppt

专知会员服务

43+阅读 · 2020年11月11日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Google BERT作者Jacob斯坦福亲授《上下文词向量与预训练语言模型: BERT到T5》43页ppt

Google BERT作者Jacob斯坦福亲授《上下文词向量与预训练语言模型: BERT到T5》43页ppt

专知会员服务

91+阅读 · 2020年4月6日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

38+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

word2Vec总结

AINLP

3+阅读 · 2019年11月2日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP预训练模型大集合！

NLP预训练模型大集合！

黑龙江大学自然语言处理实验室

6+阅读 · 2018年12月31日

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

清华大学自然语言处理组年度巨献：370+篇机器翻译必读论文，一文收尽

专知

3+阅读 · 2018年12月30日

NLP预训练模型大集合

NLP预训练模型大集合

机器学习算法与Python学习

8+阅读 · 2018年12月28日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions

Arxiv

0+阅读 · 2020年11月23日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Arxiv

3+阅读 · 2019年2月28日

Deep Graph Convolutional Encoders for Structured Data to Text Generation

Arxiv

6+阅读 · 2018年10月23日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Unsupervised Neural Machine Translation with Weight Sharing

Arxiv

6+阅读 · 2018年4月24日

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

Arxiv

3+阅读 · 2018年4月8日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员