SOLVER: 场景-物体相互关联的视觉情感解释网络 (SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network)

Visual Emotion Analysis (VEA) aims at finding out how people feel emotionally towards different visual stimuli, which has attracted great attention recently with the prevalence of sharing images on social networks. Since human emotion involves a highly complex and abstract cognitive process, it is difficult to infer visual emotions directly from holistic or regional features in affective images. It has been demonstrated in psychology that visual emotions are evoked by the interactions between objects as well as the interactions between objects and scenes within an image. Inspired by this, we propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. Then, we conduct reasoning on the Emotion Graph using Graph Convolutional Network (GCN), yielding emotion-enhanced object features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism. Extensive experiments and comparisons are conducted on eight public visual emotion datasets, and the results demonstrate that the proposed SOLVER consistently outperforms the state-of-the-art methods by a large margin. Ablation studies verify the effectiveness of our method and visualizations prove its interpretability, which also bring new insight to explore the mysteries in VEA. Notably, we further discuss SOLVER on three other potential datasets with extended experiments, where we validate the robustness of our method and notice some limitations of it.

翻译：视觉情感分析(VEA)旨在了解人们对不同视觉刺激的情感感觉如何,这种情感刺激最近引起了社会网络共享图像的流行性。由于人类情感涉及高度复杂和抽象的认知过程,因此很难直接从感官图像的整体或区域特征中推断出视觉情感。心理学显示视觉情感是由物体之间的相互作用以及图像中对象和场景之间的相互作用所引发的。受此启发,我们提议建立一个新型的景点-Object 互换视觉感光振网络(SOLVER)来预测图像中的情感。为了探寻不同对象之间的情感关系,我们首先根据语义概念和视觉特征建立情感图。然后,我们利用图象变动网络(GCN)来推导出视觉情感情感情感情感情绪的情感情感情感情绪。我们还设计了一个屏幕-Object 放大模块,利用场景特征来指导天体特征的校准进程与拟议的场景关注机制(SOL)预测情感变化。为了探测不同对象之间的情感关系关系,我们首先进行广泛的实验和比较,然后用八种直观性图象分析方法来进行感光力分析。我们所拟议的视觉变现的直观性数据分析。