向导传播模型的蒸馏 (On Distillation of Guided Diffusion Models)

Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps, accelerating inference by at least 10-fold compared to existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate the effectiveness of our approach on text-guided image editing and inpainting, where our distilled model is able to generate high-quality results using as few as 2-4 denoising steps.

翻译：在高分辨率图像生成过程中,不使用分类器的辅助传播模型最近被证明非常有效,这些模型被广泛用于大型传播框架,包括DALLE-2、稳定分解和图像。然而,不使用分类器的辅助传播模型的下坡面是,这些模型在推论时间计算成本很高,因为它们需要评估两种扩散模型,一个等级-有条件模型和一个无条件模型,有数十至数百次。为了应对这一限制,我们建议采用一种方法,将不使用分类器的辅助传播模型蒸馏成快速取样的模型:鉴于一个事先经过培训的无分类器制导模型,我们首先学习一种单一模型,以匹配合并的有条件和无条件模型的输出,然后我们逐渐将该模型淡化成一个需要更少采样步骤的传播模型。对于在像素空间中培训的标准传播模型,我们的方法可以产生与原始模型的图像相近乎的图像,在图像网络64x64和CIFAR-10上仅采用4的取样步骤,在原始模型中取得与原始模型相比的更接近的FID/IS的成绩,同时以256比原始模型进行比较的模型,同时以256倍的升级的方式将我们的图像转换为可生成为快速地生成的模型,在S-255 快速的模型中生成的模型中,将数据转换为快速地生成为快速的模型,将数据转换为快速地生成。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/