OmniKnight:多语种神经机翻译,具有语言特异的自我学习能力 (OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation)

Although all-in-one-model multilingual neural machine translation (MNMT) has achieved remarkable progress in recent years, its selected best overall checkpoint fails to achieve the best performance simultaneously in all language pairs. It is because that the best checkpoints for each individual language pair (i.e., language-specific best checkpoints) scatter in different epochs. In this paper, we present a novel training strategy dubbed Language-Specific Self-Distillation (LSSD) for bridging the gap between language-specific best checkpoints and the overall best checkpoint. In detail, we regard each language-specific best checkpoint as a teacher to distill the overall best checkpoint. Moreover, we systematically explore three variants of our LSSD, which perform distillation statically, selectively, and adaptively. Experimental results on two widely-used benchmarks show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art

翻译：尽管近些年来所有一模多语言神经机器翻译(MNMT)都取得了显著进展,但其选定的最佳总体检查站未能同时在所有语文配对中取得最佳业绩,这是因为每个语文配对的最佳检查站(即针对特定语言的最佳检查站)分散在不同时期。在本文中,我们提出了一个名为“语言特定自我学习”的新培训战略(LSSD),以弥合语言特定最佳检查站与总体最佳检查站之间的差距。我们详细地认为,每个特定语言的最佳检查站都是培养总体最佳检查站的教师。此外,我们系统地探索了我们LSD的三个变式,这些变式以静态、选择性和适应性的方式进行蒸馏。关于两个广泛使用的基准的实验结果表明,LSSD在所有语文配对方面都得到了一致的改进,并取得了最新技术。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日