Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to autoregressive language generation. We instead view diffusion as a complementary method that can augment the generative capabilities of existing pre-trained language models. We demonstrate that continuous diffusion models can be learned in the latent space of a pre-trained encoder-decoder model, enabling us to sample continuous latent representations that can be decoded into natural language with the pre-trained decoder. We show that our latent diffusion models are more effective at sampling novel text from data distributions than a strong autoregressive baseline and also enable controllable generation.
翻译:传播模型在图像、音频和视频等连续数据模式的建模方面取得了巨大成功,但在语言等离散领域使用有限。最近试图将传播应用到语言的尝试已经将传播作为自动递减语言生成的替代方法。我们相反地将传播视为一种补充方法,可以增强现有预先培训的语言模型的遗传能力。我们证明,在经过培训的编码器-解码器模型的潜伏空间中可以学习持续传播模型,从而使我们能够对可与经过培训的解码器解码为自然语言的连续潜在代表进行取样。我们表明,我们潜在的传播模型在从数据传播中取样新文本方面比强大的自动递减基线更为有效,而且能够控制生成。