The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. Although a gap remains between generative and discriminative approaches on zero-shot recognition tasks, we find that our diffusion-based approach has stronger multimodal relational reasoning abilities than competing discriminative approaches. Finally, we use Diffusion Classifier to extract standard classifiers from class-conditional diffusion models trained on ImageNet. Even though these models are trained with weak augmentations and no regularization, they approach the performance of SOTA discriminative classifiers. Overall, our results are a step toward using generative over discriminative models for downstream tasks. Results and visualizations at https://diffusion-classifier.github.io/
翻译:近期的大规模文本到图像扩散模型极大地增强了我们的基于文本的图像生成能力。这些模型可以为各种提示生成逼真的图像,并展现出令人惊叹的组合泛化能力。迄今为止,几乎所有应用场景都专注于抽样,然而扩散模型也可以提供有用于图像生成之外的条件密度估计。本文表明,类似“稳定扩散”(Stable Diffusion)这样的大规模文本到图像扩散模型的密度估计可以利用它们进行零样本分类,而无需进行额外的训练。我们将分类问题的生成方法称为“扩散分类器”(Diffusion Classifier),在各种基准测试中都取得了优秀的结果,并优于从扩散模型提取知识的其他方法。虽然,在零样本识别任务上,生成与判别方法之间仍存在差距,但我们发现,与竞争的判别方法相比,基于扩散的方法具有更强的多模态关系推理能力。最后,我们使用扩散分类器从在ImageNet上进行类条件培训的标准扩散模型中提取标准分类器。尽管这些模型采用了弱的增强和没有正则化,但在接近SOTA的判别分类器的性能上,它们颇具竞争力。总的来说,我们的结果是使用生成模型而非判别模型进行下游任务的一步。结果和可视化请参见 https://diffusion-classifier.github.io/