Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM), a method to learn word interdependency for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points.
翻译:最近关于非自动神经机器翻译(NAT)的工作旨在通过平行解码而不牺牲质量来提高效率,然而,现有的NAT方法要么不如变换器,要么需要多重解码通行证,从而降低速度。我们建议采用Glancing语言模型(GLM),这是为单行平行生成模型学习单行双向文字相互依存性的一种方法。我们与GLM一起开发了用于机器翻译的Glance变换器(GLAT),只有单行平行解码,GLAT能够产生高质量的翻译,加速8-15倍。对多行WMT语言方向的实验显示,GLAT超越了所有先前的单行非侵略性方法,并且几乎与变异器相似,将差距缩小到0.25-0.9 BLEU点。