We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet 256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256$\times$256 and 3.85 on ImageNet 512$\times$512. We release our code at https://github.com/openai/guided-diffusion
翻译:我们显示,扩散模型的图像样本质量可以达到优于目前最新基因化模型的图像样本质量。 我们通过一系列折叠图解找到更好的结构,在无条件的图像合成中实现了这一质量。 对于有条件的图像合成,我们通过分类指导进一步提高样本质量:使用一个分类器的梯度,以简单、计算高效的方法交换多样性以忠贞。我们在图像Net 128$times 128上实现了2.97的FID,在图像Net 256$times 256美元上实现4.59美元,在图像网络 512$times 25.512上实现4.59美元,在图像网络 512$times 512上实现7.72的图像样本合成。我们把BigGAN-deep 匹配到每个样本只有多达25个远方,同时保持更好的分布范围。最后,我们发现,分类指南与扩大的推广模型相结合,在图像Net 256$times 256\time 256 和3.85 512\times 512我们发布了代码 http://gs://githhub.com/oproponeai/guide-deddd-d-difflation。