神经机器翻译的选择性知识蒸馏 (Selective Knowledge Distillation for Neural Machine Translation)

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely applied to enhance the model's performance by transferring teacher model's knowledge on each training sample. However, previous work rarely discusses the different impacts and connections among these samples, which serve as the medium for transferring teacher knowledge. In this paper, we design a novel protocol that can effectively analyze the different impacts of samples by comparing various samples' partitions. Based on above protocol, we conduct extensive experiments and find that the teacher's knowledge is not the more, the better. Knowledge over specific samples may even hurt the whole performance of knowledge distillation. Finally, to address these issues, we propose two simple yet effective strategies, i.e., batch-level and global-level selections, to pick suitable samples for distillation. We evaluate our approaches on two large-scale machine translation tasks, WMT'14 English->German and WMT'19 Chinese->English. Experimental results show that our approaches yield up to +1.28 and +0.89 BLEU points improvements over the Transformer baseline, respectively.

翻译：在许多翻译基准方面,作为NMT的一个积极研究领域,知识蒸馏被广泛应用,通过传授教师模型对每个培训样本的知识来提高模型的绩效。然而,以前的工作很少讨论这些样本的不同影响和联系,这些样本是传授教师知识的媒介。在本文中,我们设计了一个新颖的协议,通过比较各种样本分区,能够有效地分析样本的不同影响。根据上述协议,我们进行了广泛的实验,发现教师的知识不是越多越好。对具体样本的知识甚至会损害整个知识蒸馏的绩效。最后,为了解决这些问题,我们提出了两个简单而有效的战略,即批量和全球一级的选择,以采集适当的样本供蒸馏。我们评估了我们关于两个大规模机器翻译任务的方法,WMT'14英语>德语和WMT'19中文>英语。实验结果表明,我们的方法将分别达到+1.28和+0.89 BLEU的升级基准点。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

专知会员服务

39+阅读 · 2020年11月3日

【ACL2019】基于学习注意力机制的知识图谱中关系预测的嵌入 Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs