培训前多语种神经机器翻译模型的强力改造 (Robust Domain Adaptation for Pre-trained Multilingual Neural Machine Translation Models)

Recent literature has demonstrated the potential of multilingual Neural Machine Translation (mNMT) models. However, the most efficient models are not well suited to specialized industries. In these cases, internal data is scarce and expensive to find in all language pairs. Therefore, fine-tuning a mNMT model on a specialized domain is hard. In this context, we decided to focus on a new task: Domain Adaptation of a pre-trained mNMT model on a single pair of language while trying to maintain model quality on generic domain data for all language pairs. The risk of loss on generic domain and on other pairs is high. This task is key for mNMT model adoption in the industry and is at the border of many others. We propose a fine-tuning procedure for the generic mNMT that combines embeddings freezing and adversarial loss. Our experiments demonstrated that the procedure improves performances on specialized data with a minimal loss in initial performances on generic domain for all languages pairs, compared to a naive standard approach (+10.0 BLEU score on specialized data, -0.01 to -0.5 BLEU on WMT and Tatoeba datasets on the other pairs with M2M100).

翻译：最近文献表明多语言神经机器翻译(MNMT)模式的潜力,但是,效率最高的模型并不完全适合专门行业。在这些情况下,内部数据稀少,而且在所有语文配对中都很难找到。因此,在专门领域微调MNMT模式是很困难的。在这方面,我们决定把重点放在一项新的任务上:在单一语言组合中,在单一语言配对中,对经过预先训练的MNMT模式进行调整,同时设法保持通用域数据模型的质量。通用域和其他对口数据的损失风险很高。这项任务是该行业采用MNMT模型的关键,并且处于许多其他语文配对的边界。我们建议对通用MMMT采用微调程序,将冻结和对抗性损失结合起来。我们的实验表明,与天真的标准方法相比,该程序提高了所有语文配对的通用域专门数据的性能,并在通用域的初始性能方面损失最小(在专门数据和其他组合MM100上,在WMTM和TatoeBBEU中,为0.01至0.5 BLEU)。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日