Diffusion models have recently achieved great success in synthesizing diverse and high-fidelity images. However, sampling speed and memory constraints remain a major barrier to the practical adoption of diffusion models as the generation process for these models can be slow due to the need for iterative noise estimation using complex neural networks. We propose a solution to this problem by compressing the noise estimation network to accelerate the generation process using post-training quantization (PTQ). While existing PTQ approaches have not been able to effectively deal with the changing output distributions of noise estimation networks in diffusion models over multiple time steps, we are able to formulate a PTQ method that is specifically designed to handle the unique multi-timestep structure of diffusion models with a data calibration scheme using data sampled from different time steps. Experimental results show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner, achieving a FID change of at most 1.88. Our approach can also be applied to text-guided image generation, and for the first time we can run stable diffusion in 4-bit weights without losing much perceptual quality, as shown in Figure 5 and Figure 9.
翻译:最近,传播模型在综合多种和高纤维化图像方面取得了巨大成功,然而,取样速度和记忆限制仍然是实际采用传播模型的主要障碍,因为由于需要利用复杂的神经网络进行迭代噪音估计,这些模型的生成过程可能缓慢,因为需要使用复杂的神经网络进行迭代噪音估计;我们提出解决这个问题的办法,方法是压缩噪音估计网络,使用培训后四分制(PTQ)来加速生成过程;虽然现有的PTQ方法未能有效处理扩散模型中噪音估计网络在多个时间步骤中不断变化的产出分布,但我们能够制定一种PTQ方法,专门设计该方法是为了利用不同时间步骤的数据抽样进行数据校准计划,处理传播模型的独特多时间级结构。实验结果表明,我们提出的方法能够直接将全精度扩散模型四分化为8位或4位模型,同时以无培训方式保持可比较的性能,实现最多1.88的FID变化。 我们的方法还可以适用于文本制成图像生成,我们第一次能够用数据校准方法来处理独特的多阶段结构结构结构质量,我们可以在4位上显示图质量的下降。