Denoising diffusion models (DDMs) have led to staggering performance leaps in image generation, editing and restoration. However, existing DDMs use very large datasets for training. Here, we introduce a framework for training a DDM on a single image. Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process. To drive the reverse diffusion process, we use a fully-convolutional light-weight denoiser, which is conditioned on both the noise level and the scale. This architecture allows generating samples of arbitrary dimensions, in a coarse-to-fine manner. As we illustrate, SinDDM generates diverse high-quality samples, and is applicable in a wide array of tasks, including style transfer and harmonization. Furthermore, it can be easily guided by external supervision. Particularly, we demonstrate text-guided generation from a single image using a pre-trained CLIP model.
翻译:DDMS(DDMs)在图像生成、编辑和修复方面产生了惊人的性能飞跃。然而,现有的DDMS使用非常庞大的数据集进行培训。在这里,我们引入了一个单一图像培训DDM的框架。我们用SinDDM(SinDDM)这个方法通过多尺度的传播过程来学习培训图像的内部统计数据。为了推动反向扩散进程,我们使用了完全革命的轻量脱noiser,它以噪声水平和比例为条件。这个结构允许以粗略到纤维的方式生成任意尺寸的样本。正如我们所说明的那样,SinDDM(SindM)生成了多种高质量的样本,并适用于广泛的任务,包括样式转移和协调。此外,它很容易受到外部监督的指导。特别是,我们用经过预先训练的 CLIP 模型来演示单一图像的文本制导生成。