以词组为基础和神经系统不受监督的机器翻译 (Phrase-Based & Neural Unsupervised Machine Translation)

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of bitexts, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage automatic generation of parallel data by backtranslating with a backward model operating in the other direction, and the denoising effect of a language model trained on the target side. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT14 English-French and WMT16 German-English benchmarks, our models respectively obtain 27.1 and 23.6 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points.

翻译：机器翻译系统在某些语言上接近人的水平性能,但其效力在很大程度上取决于能否获得大量比特字,这妨碍了对大多数语言的可适用性。这项工作调查了在只获得每种语言的大型单一语言公司时如何学习翻译。我们建议了两种模型变体,一种神经和一种以短语为基础的模型。两种版本都利用自动生成平行数据的方法,与在另一方向运行的落后模式进行反向转换,以及由目标方培训的语言模型的去音效果。这些模型比文献中的方法要好得多,但比较简单,并且具有较少的超参数。在广泛使用的WMT14英语-法语基准和WMT16德语-英语基准中,我们的模型在不使用单一的平行句子的情况下,分别获得27.1和23.6 BLEU点,比艺术水平高出11个BLEU点。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日