神经机器翻译有意识培训框架 (Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation) - 专知论文

会员服务 ·

0

NMT · CMLM · Machine Translation · 置信度 · MoDELS ·

2022 年 2 月 28 日

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

翻译：神经机器翻译有意识培训框架

Chulun Zhou,Fandong Meng,Jie Zhou,Min Zhang,Hongji Wang,Jinsong Su

from arxiv, Pre-print version; Accepted at ACL 2022 as a long paper of main conference

Most dominant neural machine translation (NMT) models are restricted to make predictions only according to the local context of preceding words in a left-to-right manner. Although many previous studies try to incorporate global information into NMT models, there still exist limitations on how to effectively exploit bidirectional global context. In this paper, we propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for NMT, where the NMT model is jointly trained with an auxiliary conditional masked language model (CMLM). The training consists of two stages: (1) multi-task joint training; (2) confidence based knowledge distillation. At the first stage, by sharing encoder parameters, the NMT model is additionally supervised by the signal from the CMLM decoder that contains bidirectional global contexts. Moreover, at the second stage, using the CMLM as teacher, we further pertinently incorporate bidirectional global context to the NMT model on its unconfidently-predicted target words via knowledge distillation. Experimental results show that our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets, namely WMT'14 English-to-German, WMT'19 Chinese-to-English and WMT'14 English-to-French, respectively.

翻译：多数占主导地位的神经机器翻译(NMT)模型仅限于根据前言的当地背景以左对右方式作出预测。虽然许多先前的研究试图将全球信息纳入NMT模型,但在如何有效利用双向全球背景方面仍然存在限制。在本文件中,我们提议为NMT建立一个基于信任的双向全球认识双向双向背景培训框架,在NMT模式中,通过附带的有条件的隐蔽语言模型(CMLM)进行联合培训。培训包括两个阶段:(1) 多任务联合培训;(2) 以信任为基础的知识蒸馏。在第一阶段,通过共享编码参数,NMTM模型在包含双向全球背景的CMLMD解码信号下,对NMT模型进行了额外的监督。此外,在第二阶段,我们利用CMLMM教学作为教师,进一步将双向全球背景纳入NMT模型,通过知识蒸馏,将中国不自信的标语、WBGCA培训框架和WMT+MT的3MT的NMT的英语+MT3MT的英语模型和MT的BMT的3MT的B+MU的3MT的MT的BMT的3级B+2、NMT的BMT的3MT的BMT的3MT的MT的BMT的MT的3级、B的B+L的3级数据翻版。

0

相关内容

NMT

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【推荐论文】多通道注意力选择GAN的图像到图像转换，Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

【推荐论文】多通道注意力选择GAN的图像到图像转换，Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

专知会员服务

30+阅读 · 2020年2月6日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

第七届全国数学文化论坛

国家自然科学基金

3+阅读 · 2016年12月31日

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

基于复杂网络的软件多维可靠性分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

ω3多不饱和脂肪酸代谢产物前列腺素E3(PGE3)和消退素(resolvins)抗前列腺癌机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

多观测量融合的水下被动目标跟踪方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向多时相腹部CT图像的多器官计算机辅助诊断关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多特征情感信息融合的高效率e-Learning关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

烟草内生菌多样性研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于超视锐度机理的图像超分辨率重构

国家自然科学基金

0+阅读 · 2008年12月31日

DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

Arxiv

0+阅读 · 2022年4月20日

Exploring Dense Retrieval for Dialogue Response Selection

Arxiv

0+阅读 · 2022年4月20日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

Arxiv

0+阅读 · 2022年4月18日

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Arxiv

0+阅读 · 2022年4月15日

Controllable Multi-Interest Framework for Recommendation

Arxiv

18+阅读 · 2020年8月3日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【推荐论文】多通道注意力选择GAN的图像到图像转换，Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

【推荐论文】多通道注意力选择GAN的图像到图像转换，Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

专知会员服务

30+阅读 · 2020年2月6日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

Arxiv

0+阅读 · 2022年4月20日

Exploring Dense Retrieval for Dialogue Response Selection

Arxiv

0+阅读 · 2022年4月20日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

Arxiv

0+阅读 · 2022年4月18日

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Arxiv

0+阅读 · 2022年4月15日

Controllable Multi-Interest Framework for Recommendation

Arxiv

18+阅读 · 2020年8月3日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

相关基金

第七届全国数学文化论坛

国家自然科学基金

3+阅读 · 2016年12月31日

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

基于复杂网络的软件多维可靠性分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

ω3多不饱和脂肪酸代谢产物前列腺素E3(PGE3)和消退素(resolvins)抗前列腺癌机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

多观测量融合的水下被动目标跟踪方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向多时相腹部CT图像的多器官计算机辅助诊断关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多特征情感信息融合的高效率e-Learning关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

烟草内生菌多样性研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于超视锐度机理的图像超分辨率重构

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员