Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.
翻译:扩散模型最近显示,在密度估计方面,基因模型模型、感知质量优于GAN模型和自动递减模型的超常性GAN模型有巨大的希望。剩下的一个下行是其缓慢的取样时间:生成高质量的样本需要数百或数千个模型评估。在这里,我们作出两项贡献来帮助消除这一下行:首先,我们提出传播模型的新参数,在使用少量取样步骤时,这些模型具有更大的稳定性;第二,我们提出一种方法,用许多步骤将经过训练的确定性扩散样品精选,用许多步骤,将新的扩散模型蒸馏成一个采用一半取样步骤的新扩散模型。然后,我们不断将这种蒸馏程序应用于我们的模型,每次将所需的取样步骤的数目减少一半。在标准图像生成基准上,如CIFAR-10、图像网络和LSUN,我们从最先进的采样器到多达8192个步骤,开始,能够将模型的稳定性提高到4个步骤,而不会失去很多概念质量;例如,在4个步骤中实现3.0的CIAR-10的FID。最后,我们不断对模型应用这种蒸馏程序,将每次所需的采样数量减少一半的采样模型,我们显示,因此在最初的基因的研磨制式的模型上,没有比试验程序要用较晚的模型进行试验。