Group-level emotion recognition (ER) is a growing research area as the demands for assessing crowds of all sizes are becoming an interest in both the security arena as well as social media. This work extends the earlier ER investigations, which focused on either group-level ER on single images or within a video, by fully investigating group-level expression recognition on crowd videos. In this paper, we propose an effective deep feature level fusion mechanism to model the spatial-temporal information in the crowd videos. In our approach, the fusing process is performed on the deep feature domain by a generative probabilistic model, Non-Volume Preserving Fusion (NVPF), that models spatial information relationships. Furthermore, we extend our proposed spatial NVPF approach to the spatial-temporal NVPF approach to learn the temporal information between frames. To demonstrate the robustness and effectiveness of each component in the proposed approach, three experiments were conducted: (i) evaluation on AffectNet database to benchmark the proposed EmoNet for recognizing facial expression; (ii) evaluation on EmotiW2018 to benchmark the proposed deep feature level fusion mechanism NVPF; and, (iii) examine the proposed TNVPF on an innovative Group-level Emotion on Crowd Videos (GECV) dataset composed of 627 videos collected from publicly available sources. GECV dataset is a collection of videos containing crowds of people. Each video is labeled with emotion categories at three levels: individual faces, group of people, and the entire video frame.
翻译:由于对评估各种规模人群的需求正在变得对安全领域和社会媒体都感兴趣,因此群体情感识别(ER)是一个日益增长的研究领域,因为评估各种规模人群的需求正在成为对安全领域和社会媒体的兴趣。这项工作扩大了早期的ER调查范围,通过充分调查群体层面对人群视频表达的识别,将重点放在群体层面的单一图像或视频中,从而全面调查群体层面的在线情感识别(ER),在群体层面对人群情感识别(ER)是一个日益增长的研究领域。在本文中,我们建议建立一个有效的深层次特征融合机制,以模拟人群视频视频中的空间信息。 在我们的方法中,通过一个基因稳定模型(NVPF),在深度特征域域域域内实施启动程序,该模型是空间信息连接(NVPF)系统(NVPF)系统(NVFC)系统(NVFC)系统(NVFC)系统(NVFS)系统(NVC)系统(NVFC)系统(NVFC)系统(NVF)系统(GVC)的每个级别的拟议视频数据采集)和(NVFFEFCS-C)系统(S-C)系统(S-C)系统(SD)的每个级别上的拟议视频数据采集)的视频数据基层)的级别。