We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.
翻译:本文针对广告图像生成任务,提出了三项评估指标,用于衡量生成广告图像在创意性(Creativity)、提示对齐性(prompt Alignment)和说服力(Persuasiveness)(合称CAP)方面的表现。尽管文本到图像(T2I)生成技术近期取得了进展,并在生成符合显式描述的高质量图像方面表现出色,但评估这些模型仍然具有挑战性。现有的评估方法主要侧重于评估与显式、详细描述的匹配程度,而评估与视觉上隐式提示的对齐性仍是一个悬而未决的问题。此外,创意性和说服力是提升广告图像效果的关键品质,却鲜少被量化评估。为此,我们提出了三项新颖的指标,用于评估生成图像的创意性、对齐性和说服力。我们的研究结果表明,当输入文本为隐式信息时,当前的T2I模型在创意性、说服力和对齐性方面均存在不足。我们进一步提出了一种简单而有效的方法,以增强T2I模型生成更对齐、更具创意且更具说服力的图像的能力。