The recent surge in generative AI has led to new models being introduced almost every month. In light of this rapid progression, we pose and address a central question: to what extent must prompts evolve as the capabilities of generative AI models advance? To answer this question, we conducted an online experiment with N = 1,983 participants where each participant was incentivized to write prompts to reproduce a target image as closely as possible in 10 consecutive tries. Each participant was randomly and blindly assigned to use one of three text-to-image diffusion models: DALL-E 2, its more advanced successor, DALL-E 3, or a version of DALL-E 3 with automatic prompt revision. In total, we collected and analyzed over 18,000 prompts and over 300,000 images. We find that task performance was higher for participants using DALL-E 3 than for those using DALL-E 2. This performance gap corresponds to a noticeable difference in the similarity of participants' images to their target images, and was caused in equal measure by: (1) the increased technical capabilities of DALL-E 3, and (2) endogenous changes in participants' prompting in response to these increased capabilities. Furthermore, while participants assigned to DALL-E 3 with prompt revision still outperformed those assigned to DALL-E 2, automatic prompt revision reduced the benefits of using DALL-E 3 by 58%. Our results suggest that for generative AI to realize its full impact on the global economy, people, firms, and institutions will need to update their prompts in response to new models.
翻译:暂无翻译