Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.
翻译:连续的传播模型能否在图像生成的自然语言上带来同样的性能突破?为了绕过文本数据的离散性质,我们可以简单地在连续的嵌入空间里投射符号,正如语言建模的标准一样。我们提议了“自动嵌入式扩散”机制,它是一个持续的扩散机制,在象征性嵌入上运作,并能够学习用于有条件和无条件的文本生成的灵活和可扩展的传播模型。我们通过定性和定量评估,显示我们的文本传播模型生成的样本与标准自动递减语言模型生成的样本相似 — — 同时在理论中,在引文时间的加速器硬件上效率更高。我们的工作为扩大文本的传播模型铺平了道路,类似于“自动递增型模型 ”, 并改进了最新改进的功能以持续传播。