In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French-to-English and German-to-English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project.
翻译:尽管神经机翻译(NMT)在标准基准方面最近取得了成功,但大量平行公司缺乏大量平行公司对许多语言配对构成了一个重大的实际问题。已经提出若干建议来缓解这一问题,例如,三角和半监督学习技术,但它们仍需要一个强有力的跨语言信号。在这项工作中,我们完全消除了平行数据的需求,并提出了一个全然不受监督地培训NMT系统的新方法,仅依靠单一语言公司。我们的模式基于最近进行的未经监督的嵌入图绘制工作,包括一个稍作修改的注意编码解码模型模型,该模型可单独使用单语化和反翻译相结合的方法来培训单一语言公司。尽管这一方法简单,但我们的系统在WMT 2014 法语到英语和德语到英语翻译中获得了15.56和10.21 BLEEU点。该模型还可以从小型平行公司中受益,并在合并10万个平行判决后达到21.81和15.24点。我们的实施作为一个开放源项目分别发布。