Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, $F_2$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT'14 English-German and WMT'14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on social media.
翻译:在低资源神经机器翻译(NMT)中,自监督的文本代表培训前培训成功应用到低资源神经机器翻译(NMT)中,但通常未能在资源丰富的NMT方面取得显著成果。在本文件中,我们提议采用联合培训办法,即$F_2$-XEnDec,将自监督和监督的学习相结合,优化NMT模式。为了利用辅助性自我监督的信号进行监管学习,NMT模式在通过称作交叉编码器脱coder的新程序从单语和平行句中相互结合的实例上进行了培训。对资源丰富的两个翻译基准WMT'14英德和WMT'14英法的实验表明,我们的方法在若干强有力的基线方法基础上取得了重大改进,并在将英文-法文的46.19 BLEU艺术融入后获得了一个新的状态。结果还表明,我们的方法能够改进输入干扰性投入的模型的稳健性,例如常出现在社会媒体上的代码转换噪音。