This paper demonstrates that multilingual pretraining, a proper fine-tuning method and a large-scale parallel dataset from multiple auxiliary languages are all critical for zero-shot translation, where the NMT model is tested on source languages unseen during supervised training. Following this idea, we present SixT++, a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages. SixT++ initializes the decoder embedding and the full encoder with XLM-R large, and then trains the encoder and decoder layers with a simple two-stage training strategy. SixT++ achieves impressive performance on many-to-English translation. It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively. Additionally, SixT++ offers a set of model parameters that can be further fine-tuned to develop unsupervised NMT models for low-resource languages. With back-translation on monolingual data of low-resource language, it outperforms all current state-of-the-art unsupervised methods on Nepali and Sinhal for both translating into and from English.
翻译:本文表明,多语种预培训、适当的微调方法和来自多种辅助语言的大规模平行数据集对于零点翻译至关重要,因为NMT模型在受监督的培训期间对源语言进行测试。根据这一想法,我们提出了支持100种源语言的强大多到英语的NMT模型SixT++,这是一个支持100种源语言的强大多到英语的多语言NMT模型,但只用六种源语言的平行数据集培训过一次。6T++将解码嵌入和全部编码器与 XLM-R 大型 XLM-R 连接起来,然后用简单的两阶段培训战略对编码器和解码层进行培训。SixT++在多到英语翻译上取得了令人印象深刻的成绩。它大大超越了CRISS和m2m-100,两个强大的多语言NMT系统,平均收益分别为7.2和5.0 BLEU。此外,S6T++提供了一套示范参数,可以进一步加以调整,以开发不受控制的低资源语言NMT模式,然后用单一语言数据进行后译,它超越了尼泊尔目前和SINSINA和SINSYSYSUV的所有方式。