争取在神经机器翻译中尽量利用BERT (Towards Making the Most of BERT in Neural Machine Translation)

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (\method) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed Cnmt consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show \method gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.

翻译：GPT-2和BERT展示了在各种自然语言处理任务中使用预先培训语言模式(LMS)的有效性。然而,LM微调往往在应用到资源丰富的任务时被灾难性地遗忘。在这项工作中,我们引入了一个协调一致的培训框架(method),这是将经过培训的LMS纳入神经机翻译的关键。我们提议的Cnmt由三种技术组成:(a) 无症状的蒸馏,以确保NMT模型能够保留以前经过培训的知识;(b) 动态转换大门,以避免灾难性地忘记事先培训的知识;(c) 调整学习速度的战略。我们在机器翻译方面的实验显示,WMT14英语-德语配对的BLEU得分高达3个,甚至超过了以前的最先进的培训前NMT,比1.4 BLEU得分高出。对于有4 000万个判决版的大型WMT14英语-法语任务,我们的基础模型仍然大大改进了比BEU1级高的成绩。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日