In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.
翻译:在本文中,我们考虑了实时视频面部情绪分析的问题,即面部表达识别、价值预测、振奋和检测动作单位点。我们建议采用新的框架级情绪识别算法,与单一高效网络模型在AffectNet上预先培训后提取面部特征。结果,我们的方法甚至可以用于移动设备的视频分析。第三次Aff-Wild2模拟模拟分析(ABAW)竞赛的实验结果表明,与VggFace基线相比,我们的简单模型要好得多。特别是,我们的方法特征是,在单轨表达分类、Valence-Aworingal刺激和表达分类中,对校准组合的性能措施提高了0.15-0.2。由于简单,我们的方法可以被视为所有四个子挑战的新基线。