改进低资源语言多语种神经机翻译:法文、英文-越南文 (Improving Multilingual Neural Machine Translation For Low-Resource Languages: French,English - Vietnamese)

Prior works have demonstrated that a low-resource language pair can benefit from multilingual machine translation (MT) systems, which rely on many language pairs' joint training. This paper proposes two simple strategies to address the rare word issue in multilingual MT systems for two low-resource language pairs: French-Vietnamese and English-Vietnamese. The first strategy is about dynamical learning word similarity of tokens in the shared space among source languages while another one attempts to augment the translation ability of rare words through updating their embeddings during the training. Besides, we leverage monolingual data for multilingual MT systems to increase the amount of synthetic parallel corpora while dealing with the data sparsity problem. We have shown significant improvements of up to +1.62 and +2.54 BLEU points over the bilingual baseline systems for both language pairs and released our datasets for the research community.

翻译：先前的著作表明,低资源语言配对可以受益于多语种机器翻译系统(MT),该系统依赖许多对语言的联合培训。本文提出两个简单的战略,以解决两种低资源语言配对的多语言MT系统中的稀有字问题:法语-越南语和英语-越南语。第一个战略是源语言共享空间的象征物动态学习用词相似,而另一个战略则试图通过在培训期间更新其嵌入内容来增加稀有文字的翻译能力。此外,我们利用多种语言MT系统的单语数据来增加合成平行子公司的数量,同时处理数据广度问题。我们已经在双语基线系统中为两种语言配对提供了高达+1.62和+2.54 BLEU点的重大改进,并为研究界发布了我们的数据集。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日