With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts and what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset. DiffusionDB contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users. We analyze prompts in the dataset and discuss key properties of these prompts. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.
翻译:最近,随着传播模型的发展,用户可以通过以自然语言撰写文本提示来生成高质量的图像。然而,生成带有所需细节的图像需要适当的提示,而且往往不清楚模型如何对不同的提示作出反应,以及什么是最好的提示。为了帮助研究人员应对这些关键的挑战,我们引入了DifpilDB,这是第一个大规模文本到图像快速数据集。DifpulationDB包含由稳定扩散产生的200万张图像,这些图像使用的是真实用户指定的提示和超参数。我们分析了数据集中的提示,并讨论了这些提示的关键特性。这个人类激活数据集的空前规模和多样性提供了令人振奋的研究机会,以了解提示和基因化模型之间的相互作用,探测深度,并设计人类-AI互动工具,帮助用户更方便地使用这些模型。DiffulpDB在https://poloclub.github.io/difludb上公布。