We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation of aggressive decoding and verification that are both efficient due to parallel computing. We propose two Aggressive Decoding paradigms for 2 kinds of seq2seq tasks: 1) For the seq2seq tasks whose inputs and outputs are highly similar (e.g., Grammatical Error Correction), we propose Input-guided Aggressive Decoding (IAD) that aggressively copies from the input sentence as drafted decoded tokens to verify in parallel; 2) For other general seq2seq tasks (e.g., Machine Translation), we propose Generalized Aggressive Decoding (GAD) that first employs an additional non-autoregressive decoding model for aggressive decoding and then verifies in parallel in the autoregressive manner. We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks: 1) For IAD, we show that it can introduce a 7x-9x speedup for the Transformer in Grammatical Error Correction and Text Simplification tasks with the identical results as greedy decoding; 2) For GAD, we observe a 3x-5x speedup with the identical or even better quality in two important seq2seq tasks: Machine Translation and Abstractive Summarization. Moreover, Aggressive Decoding can benefit even more from stronger computing devices that are better at parallel computing. Given the lossless quality as well as significant and promising speedup, we believe Aggressive Decoding may potentially evolve into a de facto standard for efficient and lossless seq2seq generation in the near future.
翻译:我们用一种新颖的解码算法 -- -- 递进式解码法 -- -- 递进式解码法,研究后代的无损失加速率。与以往的努力(例如,非递进式解码法)不同的是,以质量损失为代价加速后代的生成速度,我们的方法旨在产生与自动递进式解码法相比的相同(或更好的)生成速度,但速度要大大加快,这要靠积极解码和核查合作实现,因为同时计算既有效。我们为2类后代任务提出了两种递进式衰减模式:1)对于投入和产出非常相似的后代2级任务(例如,非递进式解码解码法),我们建议投入的后代(或更好的)生成与自动递进式解码相仿的生成(IAD),我们建议其他一般的后代变变法任务(e.g.,机器翻译),我们建议普遍递进式递进式的变变变法(GAGAD),首先使用一个非递进式的脱变法模式,然后在进式的变进式的变进式模式中将两个递进式的变进式模型,然后在 递进式的递进式变进式的递进式任务中进行进式的变进式的变制的变制的变制式的变进式的变制方法,我们进制的变制的变制的变制的变制方法,然后在6级的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制方法中显示的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制方法,我们的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制