We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of an existing ImageNet-64 model from 2.07 to near-SOTA 1.55.
翻译:我们认为,基于传播的基因模型的理论和实践目前已经不必要地复杂化,试图通过提供一个明确区分具体设计选择的设计空间来纠正这种情况。这让我们能够确定抽样和培训过程的若干变化,以及得分网络的先决条件。 我们的改进共同产生了新的最新的、最先进的FID,在等级条件下为CIFAR-10提供了1.79,在无条件环境下提供了1.97,抽样比以前设计要快得多(每幅图像35个网络评价)。为了进一步展示其模块性,我们表明我们的设计变化极大地提高了从以前工作中预先培训的得分网络获得的效率和质量,包括将现有的图像网络-64模型从2.07提高到接近SOTA 1.55。