Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages, rather than training separate models for different languages. Learning a single model can enhance the low-resource translation by leveraging data from multiple languages. However, the performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer. In this paper, we propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer. HKD generates a set of multilingual teacher-assistant models via a selective knowledge distillation mechanism based on the language groups, and then distils the ultimate multilingual model from those assistants in an adaptive way. Experimental results derived from the TED dataset with 53 languages demonstrate the effectiveness of our approach in avoiding the negative transfer effect in MNMT, leading to an improved translation performance (about 1 BLEU score on average) compared to strong baselines.
翻译:多语言神经机器翻译(MNMT) 培训一个单一的NMT模式,支持多种语言之间的翻译,而不是培训不同语言的不同模式。 学习一个单一模式可以通过利用多种语言的数据来强化低资源翻译。 但是,MNMT模式的性能在很大程度上取决于培训中使用的语言类型,因为从多种语言中转让知识会因负传输而降低翻译性能。 在本文中,我们建议对MNMT采用一个等级知识蒸馏(HKD) 方法,该方法将根据文字类型特征和不同语言的生理特征产生的语言群体作为资本,以克服负面转移问题。 HKD通过基于语言群体的选择性知识蒸馏机制生成一套多语言教师-辅助模式,然后以适应性的方式从这些助理中分离出最终的多语言模式。 由53种语言组成的TED数据集产生的实验结果表明,我们的方法在避免MMT的负面转移效果方面是有效的,从而比强的基准提高翻译性(平均约1个BLEU分)。