Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks, which significantly reduces the inference latency at the expense of quality drop compared to the Transformer baseline. In this work, we target on closing the performance gap while maintaining the latency advantage. We first inspect the fundamental issues of fully NAT models, and adopt dependency reduction in the learning space of output tokens as the basic guidance. Then, we revisit methods in four different aspects that have been proven effective for improving NAT models, and carefully combine these techniques with necessary modifications. Our extensive experiments on three translation benchmarks show that the proposed system achieves the new state-of-the-art results for fully NAT models, and obtains comparable performance with the autoregressive and iterative NAT systems. For instance, one of the proposed models achieves 27.49 BLEU points on WMT14 En-De with approximately 16.5X speed up at inference time.
翻译:建议完全非潜移心机翻译(NAT)同时预测带有单一前方神经网络的象征物,从而大大降低推导延迟,与变压器基线相比,降低质量下降率。在这项工作中,我们的目标是缩小性能差距,同时保持潜伏优势。我们首先检查完全NAT模型的基本问题,并在输出符号的学习空间中采用减少依赖性作为基本指导。然后,我们重新审视四个不同方面的方法,这些方法已证明对改进NAT模型有效,并仔细将这些技术与必要的修改结合起来。我们在三个翻译基准方面的广泛实验表明,拟议的系统在完全NAT模型方面取得了新的最新结果,并取得了与自动反向和迭代NAT系统相似的性能。例如,其中一个拟议模型在WMT14 En-De上达到了27.49 BLEU点,在推断时速度加快了约16.5X。