Social media platforms enable users to express emotions by posting text with accompanying images. In this paper, we propose the Affective Image Filter (AIF) task, which aims to reflect visually-abstract emotions from text into visually-concrete images, thereby creating emotionally compelling results. We first introduce the AIF dataset and the formulation of the AIF models. Then, we present AIF-B as an initial attempt based on a multi-modal transformer architecture. After that, we propose AIF-D as an extension of AIF-B towards deeper emotional reflection, effectively leveraging generative priors from pre-trained large-scale diffusion models. Quantitative and qualitative experiments demonstrate that AIF models achieve superior performance for both content consistency and emotional fidelity compared to state-of-the-art methods. Extensive user study experiments demonstrate that AIF models are significantly more effective at evoking specific emotions. Based on the presented results, we comprehensively discuss the value and potential of AIF models.
翻译:社交媒体平台允许用户通过发布带有配图的文本来表达情感。本文提出情感图像滤镜任务,旨在将文本中视觉抽象的情感映射至视觉具象的图像,从而生成富有情感感染力的结果。我们首先介绍AIF数据集及模型构建框架,随后提出基于多模态Transformer架构的初步尝试AIF-B模型。进而,我们提出AIF-D模型作为AIF-B的扩展,通过有效利用预训练大规模扩散模型的生成先验,实现更深层次的情感映射。定量与定性实验表明,相较于现有先进方法,AIF模型在内容一致性与情感保真度方面均取得更优性能。大规模用户研究实验证实,AIF模型在激发特定情感方面具有显著优势。基于实验结果,我们系统探讨了AIF模型的价值与潜力。