This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Some additional language-specific components may be trained to improve performance (e.g., Transformer layers or adapter modules). Because the parameters of the original model are not modified, its performance on the initial languages does not degrade. We show on two sets of experiments (small-scale on TED Talks, and large-scale on ParaCrawl) that this approach performs as well or better as the more costly alternatives; and that it has excellent zero-shot performance: training on English-centric data is enough to translate between the new language and any of the initial languages.
翻译:本文提出在现有多语种NMT模式中增加新的源或目标语言的方法,而不对它进行初始语言组别的再培训,包括用小语言专用词汇取代共享词汇,并微调新语言平行数据的新嵌入内容。一些额外的语言特定组成部分可以接受培训,以提高工作绩效(例如变换层或适配模块)。由于原始模式的参数没有修改,其初始语言的性能不会下降。我们在两套实验(TED会谈的小规模和帕拉拉劳尔的大规模实验)中显示,这一方法作为更昂贵的替代方法,效果良好或更好;它具有出色的零效果:关于以英语为中心的数据的培训足以在新语言和任何初始语言之间翻译。