Although all-in-one-model multilingual neural machine translation (multilingual NMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we propose a novel training strategy named LSSD (Language-Specific Self-Distillation), which can alleviate the convergence inconsistency and help MNMT models achieve the best performance on each language pair simultaneously. Specifically, LSSD picks up language-specific best checkpoints for each language pair to teach the current model on the fly. Furthermore, we systematically explore three sample-level manipulations of knowledge transferring. Experimental results on three datasets show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art.
翻译:虽然所有一模范多语言神经机车翻译(多语言NMT)都取得了显著进展,但联合培训的趋同性却被忽视,即不同语言对口在不同时代达到趋同,导致经过培训的MNMT模型在配置低资源语言翻译时过度配置低资源语言翻译,而配置高资源翻译不足。在本文件中,我们提议了一个名为LSSD(Language-Special-Segic-Scientific Schoolation)的新式培训战略,它可以减轻趋同性,帮助MNMT模型在每种语言对口同时取得最佳业绩。具体地说,LSD为每一对语言对口配备了语言专用的最佳检查点,以在飞翔上教授当前模式。此外,我们系统地探索了三种对知识传输的抽样操作。三个数据集的实验结果表明,LSDD在对待所有语言对口并实现最新技术方面都得到了一致的改进。