Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages. To address this problem, previous work focused on adjusting translation granularity or expanding the vocabulary size. However, morphological information is relatively under-considered in NMT architectures, which may further improve translation quality. We propose a novel method, which can not only reduce data sparsity but also model morphology through a simple but effective mechanism. By predicting the stem and suffix separately during decoding, our system achieves an improvement of up to 1.98 BLEU compared with previous work on English to Russian translation. Our method is orthogonal to different NMT architectures and stably gains improvements on various domains.
翻译:当有限的词汇无法充分覆盖源或目标方时,神经机器翻译(NMT)就存在性能缺陷,而这种功能在处理形态丰富语言时经常发生。为了解决这一问题,以前的工作重点是调整翻译颗粒性或扩大词汇规模。然而,在NMT结构中,形态信息相对考虑不足,这可能会进一步提高翻译质量。我们提出了一个新颖的方法,它不仅能够减少数据散落性,还可以通过简单而有效的机制模拟形态。通过在解码过程中分别预测干线和后缀,我们的系统实现了1.98 BLEU的改进,而以前关于英文译为俄文翻译的工作则实现了1.98 BLEU的改进。我们的方法与不同的NMT结构不尽相同,在各个领域都取得了稳步的改进。