Diffusion models have demonstrated impressive image generation performance, and have been used in various computer vision tasks. Unfortunately, image generation using diffusion models is very time-consuming since it requires thousands of sampling steps. To address this problem, here we present a novel pyramidal diffusion model to generate high resolution images starting from much coarser resolution images using a single score function trained with a positional embedding. This enables a time-efficient sampling for image generation, and also solves the low batch size problem when training with limited resources. Furthermore, we show that the proposed approach can be efficiently used for multi-scale super-resolution problem using a single score function.
翻译:传播模型展示了令人印象深刻的图像生成性能,并被用于各种计算机视觉任务。 不幸的是,使用扩散模型生成图像非常耗时,因为它需要数千个取样步骤。为了解决这一问题,我们在这里展示了一个新的金字塔式扩散模型,以便利用经过定位嵌入培训的单一分数功能,从非常粗糙的分辨率图像中产生高分辨率图像。这样可以对图像生成进行具有时间效率的取样,并在培训资源有限时解决低批量规模的问题。此外,我们还表明,拟议的方法可以用单一分函数有效地用于解决多尺度超分辨率问题。