Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the-shelf MPE, then it is directly tested on zero-shot language pairs. We propose SixT, a simple yet effective model for this task. SixT leverages the MPE with a two-stage training schedule and gets further improvement with a position disentangled encoder and a capacity-enhanced decoder. Using this method, SixT significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Furthermore, with much less training computation cost and training data, our model achieves better performance on 15 any-to-English test sets than CRISS and m2m-100, two strong multilingual NMT baselines.
翻译:先前的工作主要侧重于改进NLU任务的跨语言传输,使用多语种预先培训的编码器(MPE),或改进与BERT一起监督的机器翻译的绩效。然而,人们未充分探讨MPE能否帮助促进NMT模型的跨语言传输。在本文中,我们侧重于NMT的零点跨语言传输任务。在这项任务中,NMT模型只用一个语言配对的平行数据集和一个现成的MPE来培训,然后直接用零发语言对口测试。我们建议SixT,这是这项任务的一个简单而有效的模式。SixT利用MPE进行两阶段培训时间表,并用一个位置分解的编码器和能力增强的解码器进一步改进。使用这种方法,SixT大大超越了为NMTT专门设计的一个经过预先培训的多语言编码解码器模型模型(MBART),然后用零发任何英语测试组平均改进了7BLEU,这是在14种源语言上,比CMIS2级测试要高出两个数据。