Most dominant neural machine translation (NMT) models are restricted to make predictions only according to the local context of preceding words in a left-to-right manner. Although many previous studies try to incorporate global information into NMT models, there still exist limitations on how to effectively exploit bidirectional global context. In this paper, we propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for NMT, where the NMT model is jointly trained with an auxiliary conditional masked language model (CMLM). The training consists of two stages: (1) multi-task joint training; (2) confidence based knowledge distillation. At the first stage, by sharing encoder parameters, the NMT model is additionally supervised by the signal from the CMLM decoder that contains bidirectional global contexts. Moreover, at the second stage, using the CMLM as teacher, we further pertinently incorporate bidirectional global context to the NMT model on its unconfidently-predicted target words via knowledge distillation. Experimental results show that our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets, namely WMT'14 English-to-German, WMT'19 Chinese-to-English and WMT'14 English-to-French, respectively.
翻译:多数占主导地位的神经机器翻译(NMT)模型仅限于根据前言的当地背景以左对右方式作出预测。虽然许多先前的研究试图将全球信息纳入NMT模型,但在如何有效利用双向全球背景方面仍然存在限制。在本文件中,我们提议为NMT建立一个基于信任的双向全球认识双向双向背景培训框架,在NMT模式中,通过附带的有条件的隐蔽语言模型(CMLM)进行联合培训。培训包括两个阶段:(1) 多任务联合培训;(2) 以信任为基础的知识蒸馏。在第一阶段,通过共享编码参数,NMTM模型在包含双向全球背景的CMLMD解码信号下,对NMT模型进行了额外的监督。此外,在第二阶段,我们利用CMLMM教学作为教师,进一步将双向全球背景纳入NMT模型,通过知识蒸馏,将中国不自信的标语、WBGCA培训框架和WMT+MT的3MT的NMT的英语+MT3MT的英语模型和MT的BMT的3MT的B+MU的3MT的MT的BMT的3级B+2、NMT的BMT的3MT的BMT的3MT的MT的BMT的MT的3级、B的B+L的3级数据翻版。