This survey reviews text-to-image diffusion models in the context that diffusion models have emerged to be popular for a wide range of generative tasks. As a self-contained work, this survey starts with a brief introduction of how a basic diffusion model works for image synthesis, followed by how condition or guidance improves learning. Based on that, we present a review of state-of-the-art methods on text-conditioned image synthesis, i.e., text-to-image. We further summarize applications beyond text-to-image generation: text-guided creative generation and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.
翻译:本综述在扩散模型在各种生成任务中日益受欢迎的背景下,着重回顾了文本到图像扩散模型。作为一项自成体系的工作,本综述首先简要介绍了基本的扩散模型如何用于图像合成,接着介绍了条件或指导如何改善学习。基于此,我们概述了最新的文本条件图像合成方法,即文本到图像。我们进一步总结了超越文本到图像生成的应用:文本导向的创意生成和文本导向的图像编辑。除了迄今为止取得的进展,我们还讨论了现存的挑战和有前途的未来方向。