引导不生成什么：用于文本-图像对齐的自动化负向提示 (Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment)

Despite substantial progress in text-to-image generation, achieving precise text-image alignment remains challenging, particularly for prompts with rich compositional structure or imaginative elements. To address this, we introduce Negative Prompting for Image Correction (NPC), an automated pipeline that improves alignment by identifying and applying negative prompts that suppress unintended content. We begin by analyzing cross-attention patterns to explain why both targeted negatives-those directly tied to the prompt's alignment error-and untargeted negatives-tokens unrelated to the prompt but present in the generated image-can enhance alignment. To discover useful negatives, NPC generates candidate prompts using a verifier-captioner-proposer framework and ranks them with a salient text-space score, enabling effective selection without requiring additional image synthesis. On GenEval++ and Imagine-Bench, NPC outperforms strong baselines, achieving 0.571 vs. 0.371 on GenEval++ and the best overall performance on Imagine-Bench. By guiding what not to generate, NPC provides a principled, fully automated route to stronger text-image alignment in diffusion models. Code is released at https://github.com/wiarae/NPC.

翻译：尽管文本到图像生成领域取得了显著进展，但实现精确的文本-图像对齐仍然具有挑战性，特别是对于具有丰富组合结构或想象元素的提示。为解决这一问题，我们引入了用于图像校正的负向提示（NPC），这是一个自动化流程，通过识别并应用抑制非预期内容的负向提示来改善对齐效果。我们首先分析交叉注意力模式，以解释为何目标负向提示（直接与提示的对齐错误相关）和非目标负向提示（与提示无关但出现在生成图像中的标记）均能增强对齐。为发现有效的负向提示，NPC使用验证器-描述器-提议器框架生成候选提示，并通过显著文本空间评分进行排序，从而实现有效选择，无需额外的图像合成。在GenEval++和Imagine-Bench基准测试中，NPC优于强基线方法，在GenEval++上达到0.571（对比基线0.371），并在Imagine-Bench上取得最佳整体性能。通过引导不生成的内容，NPC为扩散模型提供了实现更强文本-图像对齐的原则性、全自动化路径。代码发布于https://github.com/wiarae/NPC。