Recently, neural machine translation (NMT) has emerged as a powerful alternative to conventional statistical approaches. However, its performance drops considerably in the presence of morphologically rich languages (MRLs). Neural engines usually fail to tackle the large vocabulary and high out-of-vocabulary (OOV) word rate of MRLs. Therefore, it is not suitable to exploit existing word-based models to translate this set of languages. In this paper, we propose an extension to the state-of-the-art model of Chung et al. (2016), which works at the character level and boosts the decoder with target-side morphological information. In our architecture, an additional morphology table is plugged into the model. Each time the decoder samples from a target vocabulary, the table sends auxiliary signals from the most relevant affixes in order to enrich the decoder's current state and constrain it to provide better predictions. We evaluated our model to translate English into German, Russian, and Turkish as three MRLs and observed significant improvements.
翻译:最近,神经机器翻译(NMT)已成为传统统计方法的有力替代物,但是,在有形态丰富语言的情况下,其性能显著下降。神经引擎通常无法解决MRL的大型词汇和高外词汇(OOOV)字率问题。因此,利用现有的基于字的模型来翻译这组语言是不合适的。在本文件中,我们提议扩展钟等人(Chung等人(2016)的先进模型,该模型在性格水平上运作,用目标方形态信息促进脱coder。在我们的结构中,又插入了一个形态表插进模型中。每次从目标词汇中提取脱coder样本时,该表都会发出来自最相关部分的辅助信号,以便丰富解码器的当前状态,并限制它提供更好的预测。我们评价了我们的模型,将英语翻译成德文、俄文和土耳其文,作为三个MRL,并观察到了显著的改进。