文本到图像生成中的默认图像探索 (An Exploration of Default Images in Text-to-Image Generation)

In the creative practice of text-to-image (TTI) generation, images are synthesized from textual prompts. By design, TTI models always yield an output, even if the prompt contains unknown terms. In this case, the model may generate default images: images that closely resemble each other across many unrelated prompts. Studying default images is valuable for designing better solutions for prompt engineering and TTI generation. We present the first investigation into default images on Midjourney. We describe an initial study in which we manually created input prompts triggering default images, and several ablation studies. Building on these, we conduct a computational analysis of over 750,000 images, revealing consistent default images across unrelated prompts. We also conduct an online user study investigating how default images may affect user satisfaction. Our work lays the foundation for understanding default images in TTI generation, highlighting their practical relevance as well as challenges and future research directions.

翻译：在文本到图像（TTI）生成的创作实践中，图像是根据文本提示合成的。从设计上讲，TTI模型总是会产生输出，即使提示中包含未知术语。在这种情况下，模型可能会生成默认图像：即在许多不相关的提示下生成的彼此高度相似的图像。研究默认图像对于设计更好的提示工程和TTI生成解决方案具有重要价值。我们首次对Midjourney平台上的默认图像进行了系统调查。我们描述了一项初步研究，其中我们手动创建了触发默认图像的输入提示，并进行了多项消融实验。在此基础上，我们对超过75万张图像进行了计算分析，揭示了不相关提示下存在的一致性默认图像。我们还开展了一项在线用户研究，探讨默认图像如何影响用户满意度。我们的工作为理解TTI生成中的默认图像奠定了基础，强调了其实际相关性以及面临的挑战和未来研究方向。