Non-autoregressive (NAR) machine translation has recently achieved significant improvements, and now outperforms autoregressive (AR) models on some benchmarks, providing an efficient alternative to AR inference. However, while AR translation is often implemented using multilingual models that benefit from transfer between languages and from improved serving efficiency, multilingual NAR models remain relatively unexplored. Taking Connectionist Temporal Classification (CTC) as an example NAR model and Imputer as a semi-NAR model, we present a comprehensive empirical study of multilingual NAR. We test its capabilities with respect to positive transfer between related languages and negative transfer under capacity constraints. As NAR models require distilled training sets, we carefully study the impact of bilingual versus multilingual teachers. Finally, we fit a scaling law for multilingual NAR, which quantifies its performance relative to the AR model as model scale increases.
翻译:最近,非自发性机器翻译(NAR)取得了显著改进,目前已在某些基准上优于自动递减模式,为AR的推论提供了有效的替代方法;然而,虽然AR的翻译往往使用多语种模型,这些模型受益于语言之间的转让和服务的提高,多语言的NAR模型仍然相对没有被探索;以NAR模型为例,将Interute作为半NAR模型,我们对多语言NAR进行了全面的经验研究;我们测试了它在相关语言之间的积极转让和能力限制下的负转移方面的能力;由于NAR模型需要精练的培训,我们仔细研究了双语教师与多语言教师的影响;最后,我们为多语言的ARAR设计了规模法,在模型增加时将其业绩与AR模型相比进行量化。