生成模型得到的合成数据提高了ImageNet分类的准确性 (Synthetic Data from Diffusion Models Improves ImageNet Classification)

Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data augmentation, helping to improve challenging discriminative tasks? We show that large-scale text-to image diffusion models can be fine-tuned to produce class conditional models with SOTA FID (1.76 at 256x256 resolution) and Inception Score (239 at 256x256). The model also yields a new SOTA in Classification Accuracy Scores (64.96 for 256x256 generative samples, improving to 69.24 for 1024x1024 samples). Augmenting the ImageNet training set with samples from the resulting models yields significant improvements in ImageNet classification accuracy over strong ResNet and Vision Transformer baselines.

翻译：深度生成模型变得越来越强大，可以根据文本提示生成多样化、高保真的逼真样本。它们已经达到了自然图像模型可用于生成数据增强的水平，有助于提高具有挑战性的区分任务吗？我们展示了大规模的文本到图像扩散模型可以被微调，以产生具有SOTA FID（在256x256分辨率下为1.76）和Inception分数（在256x256分辨率下为239）的类条件模型。该模型在分类准确性得分方面也产生了新的SOTA（在256x256生成样本方面为64.96，对于1024x1024样本，准确性得分提高至69.24）。通过采用来自生成模型的样本扩充ImageNet训练集，相对于强的ResNet和Vision Transformer基线，ImageNet分类准确率有了显著提高。

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

DeepMind | 通过去噪来进行分子性质预测的预训练

专知会员服务

13+阅读 · 2022年6月27日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日