Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to natural language, and text diffusion models are less studied. Sequence-to-sequence text generation is one of the essential natural language processing topics. In this work, we apply diffusion models to approach sequence-to-sequence text generation, and explore whether the superiority generation performance of diffusion model can transfer to natural language domain. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. SeqDiffuSeq uses an encoder-decoder Transformers architecture to model denoising function. In order to improve generation quality, SeqDiffuSeq combines the self-conditioning technique and a newly proposed adaptive noise schedule technique. The adaptive noise schedule has the difficulty of denoising evenly distributed across time steps, and considers exclusive noise schedules for tokens at different positional order. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
翻译:集成模型是一种新型的基因建模模式,在图像、音频和视频生成方面取得了巨大成功。然而,考虑到文本的离散绝对性,将连续扩散模型扩展至自然语言并非微不足道,而且对文本扩散模型的研究较少。从顺序到顺序的文本生成是基本的自然语言处理专题之一。在这项工作中,我们应用扩散模型来处理顺序到序列的文本生成,并探索扩散模型的优势生成性能能否转移到自然语言域。我们提议SeqDiffuSeq,这是从顺序到顺序生成的文本传播模型。SeqDiffuSeq使用一个编码-解密变异器结构来模拟脱色功能。为了提高生成质量,SeqDiffuSeq将自调技术与新提出的适应性噪声表技术结合起来。适应性噪音表很难在不同时间步骤间均衡分布,并考虑不同位置顺序的标牌的独家噪声表。实验结果表明,在文字质量和时间曲线上,序列到序列生成的良好表现。