We propose a two-stage training approach for developing a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 25 languages to English. We find this model can generalize to zero-shot translations on unseen languages. For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then train with successive rounds of back-translation. The final model extends to the English-to-Many direction, while retaining Many-to-English performance. We term our approach EcXTra (English-centric Crosslingual (X) Transfer). Our approach sequentially leverages auxiliary parallel data and monolingual data, and is conceptually simple, only using a standard cross-entropy objective in both stages. The final EcXTra model is evaluated on unsupervised NMT on 8 low-resource languages achieving a new state-of-the-art for English-to-Kazakh (22.3 > 10.4 BLEU), and competitive performance for the other 15 translation directions.
翻译:我们提出一个两阶段培训方法,用于开发单一的NMT模式,将看不见的语言翻译为英语或从英语翻译为英语。在第一阶段,我们先开始一个编码器解码器模式,先对XLM-R和ROBERTA重量进行预先培训,然后对25种语言的平行数据进行多语种微调,再对英语和英语的平行数据进行25种语言的多语种微调。我们发现,这个模式可以概括为对隐性语言的零点翻译。在第二阶段,我们利用这种通用能力,从单语数据集中生成合成平行数据,然后用连续的回译来培训。最后一种模式延伸到英语到多语种,同时保留多语种到英语的性能。我们称我们的方法是EXTra(以英语为中心跨语言(X)传输)。我们的方法是连续地利用辅助平行数据和单语数据,在概念上简单化,只是在两个阶段都使用标准的跨项目标。最后的EcXTra模型在8种低资源语言上对不超的NMT进行评估。最后一种状态的NMT在英语到多种语言上,同时保留多语种到英语到英语到英语到英语的功能的状态,同时保留多语种到英语的功能的功能的功能。我们称为新状态,我们称 EcXTAXTAXTAUEULEULEULEUEU,然后为其他15的竞争性的翻译)。