Recent image generation models such as Stable Diffusion have exhibited an impressive ability to generate fairly realistic images starting from a simple text prompt. Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by investigating the need for real images when training models for ImageNet classification. Provided only with the class names that have been used to build the dataset, we explore the ability of Stable Diffusion to generate synthetic clones of ImageNet and measure how useful these are for training classification models from scratch. We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study. More importantly, we show that models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data for transfer. Project page: https://europe.naverlabs.com/imagenet-sd/
翻译:----
最近的图像生成模型,如Stable Diffusion,展现了从简单的文本提示开始生成相当逼真的图像的惊人能力。当训练图像预测模型时,这些模型能否使真实图像过时?在本文中,我们通过研究训练ImageNet分类模型时需要真实图像的必要性来回答这部分问题。我们提供了构建数据集所使用的类名,并探讨了在仅提供这些信息的情况下,Stable Diffusion用于生成ImageNet克隆体,并测量这些克隆体对于从头开始训练分类模型的有用性。我们发现,只需要最小的、与类别无关的提示,ImageNet克隆体就能够缩小模型产生的合成图像和使用真实图像训练的模型之间的差距。我们在几个标准分类基准测试中考虑了这项研究。更重要的是,我们展示了在合成图像上训练的模型具有很强的泛化性能,并且在转移时表现与在真实数据上训练的模型相当。 项目页面:https://europe.naverlabs.com/imagenet-sd/