Recent developments in large language models (LLM) and generative AI have unleashed the astonishing capabilities of text-to-image generation systems to synthesize high-quality images that are faithful to a given reference text, known as a "prompt". These systems have immediately received lots of attention from researchers, creators, and common users. Despite the plenty of efforts to improve the generative models, there is limited work on understanding the information needs of the users of these systems at scale. We conduct the first comprehensive analysis of large-scale prompt logs collected from multiple text-to-image generation systems. Our work is analogous to analyzing the query logs of Web search engines, a line of work that has made critical contributions to the glory of the Web search industry and research. Compared with Web search queries, text-to-image prompts are significantly longer, often organized into special structures that consist of the subject, form, and intent of the generation tasks and present unique categories of information needs. Users make more edits within creation sessions, which present remarkable exploratory patterns. There is also a considerable gap between the user-input prompts and the captions of the images included in the open training data of the generative models. Our findings provide concrete implications on how to improve text-to-image generation systems for creation purposes.
翻译:大型语言模型(LLM)和基因化的AI的最近发展,释放了文本到图像生成系统的惊人能力,以合成忠实于特定参考文本的高质量图像,称为“即时” 。这些系统立即得到研究人员、创作者和普通用户的大量关注。尽管为改进基因模型作出了大量努力,但在了解这些系统用户在规模上的信息需求方面所做的工作有限。我们对从多文本到图像生成系统收集的大规模快速日志进行了首次全面分析。我们的工作类似于分析网络搜索引擎的查询日志,这是对网络搜索行业和研究的光荣做出重要贡献的一行工作。与网络搜索查询、创建者和普通用户的提示相比,文字到图像提示时间长得多,往往形成由这些生成任务的主题、形式和意图以及目前独特的信息需求类别组成的特殊结构。用户在创建会议中做了更多的编辑,展示了引人注目的探索模式。在用户输入提示和图像对网络搜索产业和研究界的描述之间也存在相当大的差距。与网络搜索行业和研究界的荣誉贡献相比,文本提示大大延长了时间,往往被组织成特别结构的模型,从而改进了对生成的基因分析。</s>