Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked language model (CMLM) is one of the most versatile frameworks, as it can support many different sequence generation scenarios and achieve very competitive performance on these tasks. In this paper, we further introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder and make the encoder optimization easier. Experiments on \textbf{3} different tasks (neural machine translation, summarization, and code generation) with \textbf{15} datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. Surprisingly, our proposed model yields state-of-the-art performance on neural machine translation (\textbf{34.62} BLEU on WMT16 EN$\to$RO, \textbf{34.82} BLEU on WMT16 RO$\to$EN, and \textbf{34.84} BLEU on IWSLT De$\to$En) and even better performance than the \textbf{AR} Transformer on \textbf{7} benchmark datasets with at least \textbf{2.2$\times$} speedup. Our code is available at GitHub.
翻译:以变换器为基础的自动递增( AR) 方法在各种序列生成任务中取得了吸引人的性能, 例如神经机器翻译、 合成和代码生成等, 但却受到低推断效率的影响。 为了加快推断阶段, 在过去几年里提出了许多非自动递增( NAR) 战略。 其中, 有条件的隐含语言模式( CMLM) 是最通用的框架之一, 因为它可以支持许多不同的序列生成情景, 并实现这些任务上非常有竞争力的性能。 在本文中, 我们还引入了一个简单而有效的掩罩策略, 以提高解码机的精细化能力, 并使编码优化变得更容易。 在\ textbfff{15} 数据库中, 有条件的隐含语言模式( CMLM) 能够支持许多不同的序列生成显著的性能改进。 值得注意的是, 我们提议的模型在 NEOF$( $) 和 NEUFMT 上生成了比 NEULF 更高级的性能。</s>