Text-to-image generative models are a new and powerful way to generate visual artwork. The free-form nature of text as interaction is double-edged; while users have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt components and model parameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style and investigate success and failure modes within these dimensions. Our evaluation of 5493 generations over the course of five experiments spans 49 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people find better outcomes from text-to-image generative models.
翻译:文字到图像的基因化模型是产生视觉艺术作品的一种新而有力的方法。 文本作为互动的一种自由形式的性质是双向的; 虽然用户可以接触无数代人,但是当结果质量差时,他们也必须与文本进行粗力试验和错误; 我们进行一项研究,探讨什么即时组件和模型参数能帮助产生一致的产出; 我们特别研究如何结构上的提示,以纳入主题和风格,并调查这些层面的成败模式。 我们在五个实验过程中对5493代人的评估涉及49个抽象和具体科目以及51个抽象和比喻风格。我们从这一评价中提出设计准则,可以帮助人们从文本到图像的基因模型中找到更好的结果。