Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence. However, recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations. We argue that modes can be divided into various granularities which can be learned from easy to hard. In this study, we empirically show that NAT models are prone to learn fine-grained lower-mode knowledge, such as words and phrases, compared with sentences. Based on this observation, we propose progressive multi-granularity training for NAT. More specifically, to make the most of the training data, we break down the sentence-level examples into three types, i.e. words, phrases, sentences, and with the training goes, we progressively increase the granularities. Experiments on Romanian-English, English-German, Chinese-English, and Japanese-English demonstrate that our approach improves the phrase translation accuracy and model reordering ability, therefore resulting in better translation quality against strong NAT baselines. Also, we show that more deterministic fine-grained knowledge can further enhance performance.
翻译:非偏向翻译(NAT)通过预测整个目标序列,大大加快了推论过程。然而,最近的研究表明,NAT在学习一到多种翻译等高知识模式方面软弱无力。我们主张,模式可以分为从容易到困难学习的各种颗粒。在这项研究中,我们从经验上表明,NAT模型很容易学习精细的低模式知识,例如词句,与句子相比。根据这一观察,我们建议对NAT进行渐进式多语种培训。更具体地说,为了尽量利用培训数据,我们将句级实例细分为三种类型,即文字、短语、句子,随着培训的进行,我们逐渐增加颗粒。罗马尼亚语、英语、德语、中文、英语和英语的实验表明,我们的方法改进了语言翻译准确性和模型重新排序能力,从而在强大的NAT基线下提高了翻译质量。此外,我们表明,更多的确定性精细的知识可以进一步提高绩效。