仅使用单语公司进行不受监督的机器翻译 (Unsupervised Machine Translation Using Monolingual Corpora Only)

Machine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have been numerous attempts to extend these successes to low-resource language pairs, yet requiring tens of thousands of parallel sentences. In this work, we take this research direction to the extreme and investigate whether it is possible to learn to translate even without any parallel data. We propose a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space. By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data. We demonstrate our model on two widely used datasets and two language pairs, reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French datasets, without using even a single parallel sentence at training time.

翻译：最近,由于在深层次的学习和大规模平行连体的可用性方面取得了进步,机器翻译最近取得了令人印象深刻的业绩。人们多次尝试将这些成功推广到低资源语言配对,但需要数以万计的平行句子。在这项工作中,我们将这一研究方向推向极端,并调查是否有可能在没有平行数据的情况下学会翻译。我们提出了一个模式,用两种不同语言从单一语言的连体中取出句子并将其映射到相同的潜在空间。通过学习用两种语言从这个共享的地物空间中重建,该模型在不使用任何标签数据的情况下有效地学会翻译。我们展示了我们关于两种广泛使用的数据集和两种语言配对的模型,在Multi30k和WMT英文-法文数据集上报告BLEU分数32.8和15.1,在培训时甚至不使用一个平行的句子。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日