情感:对现实世界视觉数据进行学习的情感解释 (Affection: Learning Affective Explanations for Real-World Visual Data)

In this work, we explore the emotional reactions that real-world images tend to induce by using natural language as the medium to express the rationale behind an affective response to a given visual stimulus. To embark on this journey, we introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85,007 publicly available images, analyzed by 6,283 annotators who were asked to indicate and explain how and why they felt in a particular way when observing a specific image, producing a total of 526,749 responses. Even though emotional reactions are subjective and sensitive to context (personal mood, social status, past experiences) - we show that there is significant common ground to capture potentially plausible emotional responses with a large support in the subject population. In light of this crucial observation, we ask the following questions: i) Can we develop multi-modal neural networks that provide reasonable affective responses to real-world visual data, explained with language? ii) Can we steer such methods towards producing explanations with varying degrees of pragmatic language or justifying different emotional reactions while adapting to the underlying visual stimulus? Finally, iii) How can we evaluate the performance of such methods for this novel task? With this work, we take the first steps in addressing all of these questions, thus paving the way for richer, more human-centric, and emotionally-aware image analysis systems. Our introduced dataset and all developed methods are available on https://affective-explanations.org

翻译：在这项工作中,我们探索现实世界图像往往会以自然语言作为表达对特定视觉刺激作出感性反应背后的理由的媒介而诱发的情感反应。为了开始这一旅程,我们向研究界介绍和分享一个大型数据集,其中包含对85 007张公开提供的图像的情绪反应和自由形式文字解释,由6 283名应征者进行分析,以表明和解释他们在观察特定图像时如何和为什么感到某种特殊的方式,产生总共526 749份答复。即使情感反应是主观的和对背景(个人情绪、社会地位、过去的经历)敏感的——我们表明,在获取对主题人群的大力支持下,存在着重要的共同基础,可以捕捉到可能可信的情感反应。根据这一关键观察,我们问以下问题:一)我们能否发展多式神经网络,以语言对真实世界视觉数据作出合理的影响反应?二)我们能否指导这些方法,以不同程度的务实语言作出解释,或者在适应基本视觉刺激的同时为不同的情感反应辩解?最后,我们如何评估这种更富有感情性的工作方式?我们如何以新的方式进行这种方式分析?