Image generation based on diffusion models has demonstrated impressive capability, motivating exploration into diverse and specialized applications. Owing to the importance of emotion in advertising, emotion-oriented image generation has attracted increasing attention. However, current emotion-oriented methods suffer from an affective shortcut, where emotions are approximated to semantics. As evidenced by two decades of research, emotion is not equivalent to semantics. To this end, we propose Emotion-Director, a cross-modal collaboration framework consisting of two modules. First, we propose a cross-Modal Collaborative diffusion model, abbreviated as MC-Diffusion. MC-Diffusion integrates visual prompts with textual prompts for guidance, enabling the generation of emotion-oriented images beyond semantics. Further, we improve the DPO optimization by a negative visual prompt, enhancing the model's sensitivity to different emotions under the same semantics. Second, we propose MC-Agent, a cross-Modal Collaborative Agent system that rewrites textual prompts to express the intended emotions. To avoid template-like rewrites, MC-Agent employs multi-agents to simulate human subjectivity toward emotions, and adopts a chain-of-concept workflow that improves the visual expressiveness of the rewritten prompts. Extensive qualitative and quantitative experiments demonstrate the superiority of Emotion-Director in emotion-oriented image generation.


翻译:基于扩散模型的图像生成已展现出令人印象深刻的能力,这推动了对多样化、专业化应用的探索。鉴于情感在广告领域的重要性,情感导向的图像生成日益受到关注。然而,当前的情感导向方法存在一种情感捷径问题,即将情感近似等同于语义。二十年的研究证据表明,情感并不等同于语义。为此,我们提出了Emotion-Director,一个由两个模块组成的跨模态协作框架。首先,我们提出了一种跨模态协作扩散模型,简称为MC-Diffusion。MC-Diffusion将视觉提示与文本提示相结合进行引导,从而能够生成超越语义的情感导向图像。此外,我们通过引入负向视觉提示改进了DPO优化,增强了模型在同一语义下对不同情感的敏感性。其次,我们提出了MC-Agent,一个跨模态协作智能体系统,用于重写文本提示以表达目标情感。为避免模板化的重写,MC-Agent采用多智能体来模拟人类对情感的主观性,并采用概念链工作流程,以提升重写后提示的视觉表现力。大量的定性与定量实验证明了Emotion-Director在情感导向图像生成中的优越性。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员