Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.
翻译:机械翻译的神经方法达到了最先进的精确度,但收集大规模平行数据的成本却很高,因此,对神经机器翻译(NMT)进行了大量研究,其平行数据非常有限,即低资源环境。在本文中,我们对低资源NMT进行了调查,并根据使用的辅助数据将相关工程分为三类:(1) 利用单一语言的源语和/或目标语言数据,(2) 利用辅助语言的数据,(3) 利用多模式数据。 我们希望我们的调查能够帮助研究人员更好地了解这个领域,激励他们设计更好的算法,帮助行业从业人员为应用选择适当的算法。