Diffusion models excel at generating photorealistic images from text-queries. Naturally, many approaches have been proposed to use these generative abilities to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large noisily supervised, but nonetheless, annotated datasets. It is an open question whether the generalization capabilities of diffusion models beyond using the additional data of the pre-training process for augmentation lead to improved downstream performance. We perform a systematic evaluation of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. While we find that personalizing diffusion models towards the target data outperforms simpler prompting strategies, we also show that using the training data of the diffusion model alone, via a simple nearest neighbor retrieval procedure, leads to even stronger downstream performance. Overall, our study probes the limitations of diffusion models for data augmentation but also highlights its potential in generating new training data to improve performance on simple downstream vision tasks.
翻译:扩散模型在从文本检索生成逼真图像方面表现出色。自然地,许多方法已经提出来使用这些生成能力来增强训练数据集用于下游任务(如分类)。然而,扩散模型本身是在大型嘈杂但有注释的数据集上训练的。扩散模型除了使用预训练过程的额外数据进行数据增强以外,其推广能力是否超越此范围带来改进的下游性能还是一个开放性问题。我们对现有的扩散模型图像生成方法进行了系统评估,并研究了新扩展以评估它们在数据增强方面的好处。虽然我们发现将扩散模型个人化到目标数据方面优于更简单的提示策略,但我们还展示了仅使用扩散模型的训练数据,通过简单的最近邻检索过程,可导致更强的下游性能。总体而言,我们的研究探讨了扩散模型用于数据增强的局限性,但也凸显了其在生成新训练数据方面提升简单下游视觉任务性能的潜力。