Recognizing the emotional state of people is a basic but challenging task in video understanding. In this paper, we propose a new task in this field, named Pairwise Emotional Relationship Recognition (PERR). This task aims to recognize the emotional relationship between the two interactive characters in a given video clip. It is different from the traditional emotion and social relation recognition task. Varieties of information, consisting of character appearance, behaviors, facial emotions, dialogues, background music as well as subtitles contribute differently to the final results, which makes the task more challenging but meaningful in developing more advanced multi-modal models. To facilitate the task, we develop a new dataset called Emotional RelAtionship of inTeractiOn (ERATO) based on dramas and movies. ERATO is a large-scale multi-modal dataset for PERR task, which has 31,182 video clips, lasting about 203 video hours. Different from the existing datasets, ERATO contains interaction-centric videos with multi-shots, varied video length, and multiple modalities including visual, audio and text. As a minor contribution, we propose a baseline model composed of Synchronous Modal-Temporal Attention (SMTA) unit to fuse the multi-modal information for the PERR task. In contrast to other prevailing attention mechanisms, our proposed SMTA can steadily improve the performance by about 1\%. We expect the ERATO as well as our proposed SMTA to open up a new way for PERR task in video understanding and further improve the research of multi-modal fusion methodology.
翻译:认识人们的情绪状态是视频理解方面一项基本但具有挑战性的任务。在本文中,我们提出在这一领域的新任务,名为“对称情感情感关系识别(PERR) 。这一任务旨在承认在给定视频片段的两个互动人物之间的情感关系。它不同于传统的情感和社会关系识别任务。信息的多样性,包括性格外观、行为、面部情感、对话、背景音乐和字幕,对最终结果有不同的贡献,这使得这项任务在开发更先进的多模式方面更具挑战性,但更有意义。为了便利这项工作,我们开发了一个新的数据集,名为“情感再动”视频识别(EERATO),以戏剧和电影为基础。ERATO是一个大型的多模式数据集,它有31,182个视频剪辑,大约203个视频小时。与现有的数据集不同,它包含多镜头的互动中心视频,不同,视频长度不同,以及包括视觉、音频和文字在内的多种模式。作为小贡献,我们提议在SMOA上对S-MERM(S-MO-MO-MO-MO-S-S-S-S-MO-S-S-S-SOL-S-SOL-SOL-S-S-S-SOL-SOL-I-S-SOL-S-S-SOL-SOL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SOL-SOL-S-S-S-S-SOL-SOL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S