Recent developments in diffusion models have unleashed the astonishing capabilities of text-to-image generation systems to synthesize high-quality images that are faithful to a given reference text, known as a "prompt." These systems, once released to the public, have immediately received tons of attention from researchers, creators, and common users. Despite the plenty of efforts to improve the underneath generative models, there is limited work on understanding the information needs of the real users of these systems, e.g., by investigating the prompts the users input at scale. In this paper, we take the initiative to conduct a comprehensive analysis of large-scale prompt logs collected from multiple text-to-image generation systems. Our work is analogous to analyzing the query log of Web search engines, a line of work that has made critical contributions to the glory of the Web search industry and research. We analyze over two million user-input prompts submitted to three popular text-to-image systems at scale. Compared to Web search queries, text-to-image prompts are significantly longer, often organized into unique structures, and present different categories of information needs. Users tend to make more edits within creation sessions, showing remarkable exploratory patterns. Our findings provide concrete implications on how to improve text-to-image generation systems for creation purposes.
翻译:传播模型的最近发展释放出令人吃惊的文本到图像生成系统的能力,以合成忠实于某一参考文本的高质量图像,称为“即时”。这些系统一旦向公众发布,就立即得到研究人员、创作者和普通用户的大量关注。尽管为改进下面的基因化模型作出了大量努力,但在了解这些系统的真正用户的信息需求方面所做的工作有限,例如通过调查用户的规模输入速度来了解这些系统的实际用户。在本文件中,我们主动对从多个文本到图像生成系统收集的大规模快速日志进行全面分析。我们的工作类似于分析网络搜索引擎的查询日志,这是对网络搜索行业和研究的荣耀作出了重要贡献的一行工作。我们分析了200多万用户投入提示,提交给三个广受欢迎的文本到图像系统,例如调查用户的规模输入速度。与网络搜索查询相比,文本到图像提示大大延长,往往组织成独特的结构,并呈现不同的信息需求类别。用户倾向于对网络搜索引擎的查询记录进行类似分析,这是对网络搜索引擎的查询,这是对网络搜索和研究产业的光学进行重大分析,从而改进我们的创建过程。</s>