Different from previous work accelerating translation at the cost of quality loss, we propose Generalized Aggressive Decoding (GAD) -- a novel decoding paradigm for lossless speedup of autoregressive translation, through the collaboration of autoregressive and non-autoregressive translation (NAT) of the Transformer. At each decoding iteration, GAD aggressively decodes a number of tokens with NAT as a draft and then verifies them in the autoregressive manner, where only the tokens that pass the verification are accepted as decoded tokens. GAD can achieve the same results as autoregressive translation but much more efficiently because both NAT drafting and autoregressive verification compute in parallel. We conduct experiments in four standard WMT benchmarks and confirm that the vanilla GAD yields exactly the same results as greedy decoding with an around $3\times$ speedup, and that its variant (GAD++) with an advanced verification strategy not only outperforms the greedy translation and even achieves the comparable translation quality with the beam search result, but also further improves the decoding speed, resulting in an around $5\times$ speedup over autoregressive translation. Moreover, GAD can be easily generalized for lossless speedup of other seq2seq tasks like Abstractive Summarization, and benefit more from stronger computing devices, demonstrating its potential to become a de facto decoding paradigm in the future. Our models and codes are available at https://github.com/hemingkx/GAD.
翻译:与先前以质量损失代价加速翻译的工作不同,我们提议通用递增解码(GAD) -- -- 通过与变异器自动递增和非自动递增翻译(NAT)合作,通过变异器自动递增和非自动递增翻译(NAT),为自动递减加速翻译(GAD)提供新的解码模式。在每次解码迭代法时,GAD会积极用NAT来解码一些标记,然后以自动递增方式进行核实,只有通过核查的标记才能被接受为解码代号。GAD可以实现自动递增翻译的同样结果,但效率更高得多,因为NAT同时进行自动递增翻译和自动递增核查(NAT),我们用四个标准的WMT基准进行实验,并证实Vanilla GADAD产生与贪婪解码完全相同的结果,大约3美元快速解译,而它的变式(GAD+B)不仅超越了贪婪翻译,甚至实现了比比平价的翻译质量质量,而且更容易地展示了我们的搜索结果,而且更快速地展示了SUDADADADAD