How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.
翻译:我们如何在保持高翻译质量的同时高效地进行推断?现有的神经机器翻译模型,如变换器,能取得高性能,但能逐个解码单词,这效率很低。最近的非反向翻译模型加快了推论速度,但其质量仍然低下。在这项工作中,我们提出了DSLP,这是一个高效和高性能的机器翻译模型。关键的观点是用深监制来培训非反向变换器,并补充了多层预测。我们在四种翻译任务(WMT'14 EN-DE和WMT'16 EN-RO方向)上进行了广泛的实验。结果显示,我们的方法与各自的基准模型相比,不断改进BLEU的得分。具体地说,我们最好的变式在三种翻译任务上优于自动递增模式,而在推断中效率是14.8倍。