混合神经机械翻译 (Hybrid-Regressive Neural Machine Translation)

In this work, we empirically confirm that non-autoregressive translation with an iterative refinement mechanism (IR-NAT) suffers from poor acceleration robustness because it is more sensitive to decoding batch size and computing device setting than autoregressive translation (AT). Inspired by it, we attempt to investigate how to combine the strengths of autoregressive and non-autoregressive translation paradigms better. To this end, we demonstrate through synthetic experiments that prompting a small number of AT's predictions can promote one-shot non-autoregressive translation to achieve the equivalent performance of IR-NAT. Following this line, we propose a new two-stage translation prototype called hybrid-regressive translation (HRT). Specifically, HRT first generates discontinuous sequences via autoregression (e.g., make a prediction every k tokens, k>1) and then fills in all previously skipped tokens at once in a non-autoregressive manner. We also propose a bag of techniques to effectively and efficiently train HRT without adding any model parameters. HRT achieves the state-of-the-art BLEU score of 28.49 on the WMT En-De task and is at least 1.5x faster than AT, regardless of batch size and device. In addition, another bonus of HRT is that it successfully inherits the good characteristics of AT in the deep-encoder-shallow-decoder architecture. Concretely, compared to the vanilla HRT with a 6-layer encoder and 6-layer decoder, the inference speed of HRT with a 12-layer encoder and 1-layer decoder is further doubled on both GPU and CPU without BLEU loss.

翻译：在这项工作中,我们从经验上证实,具有迭代完善机制(IR-NAT)的非偏向翻译缺乏加速性强,因为它比自动递减翻译(AT)对解码批量大小和计算设备设置比自动递减翻译(AT)更加敏感。受它启发,我们试图调查如何将自动递减和非递减翻译模式的优点更好地结合起来。为此,我们通过合成实验表明,促使少量AT的预测可以促进一发非递增翻译,从而实现IR-NAT的同等性能。在此行之后,我们提议一个新的两阶段翻译原型,称为混合递减翻译(HRT)。具体地说,HRT首先通过自动递增生成不连续的序列(例如,以非递增方式一次预测每个K符号,K>1),然后填充所有先前的代记。我们还提议用一套技术来有效和高效地训练HRT,同时不增加任何模型参数。 HRT TO 和 NE-RT在不增加 IM 和 IMB- IM 中, 将一个最高级的H- dex 和另一个H- deal 的 IMD 级的机级的机级比H- dex 更快速的H- dex 更快速的H- dex 更快速的H- dex 和另一个的H- dex 级的H- dex 级的机级的机级的机级的机级,是另一个的H- d- d- d- d- d- d- d- d- d-x 的H- d- d-x 的H- d-x-x 的机级的H- d-x 级的机级的H- d- 的机级的机级的H-x-x-x-x-x-x-x-x- d- d- d- d- d- d-xxxx-x-x-x-x-x-x-xxxx-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- d-x-x-x-x- 和和和和和和和和和和和的H- d-x-

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集