Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.
翻译:分析人类影响对人体计算机互动系统至关重要。 多数方法都是在限制的情景下开发的,对周围环境不切实际。 情感行为分析(ABAW) 2021 比赛为这一动态问题提供了一个基准。 在本文中,我们采用多模式和多任务学习方法,同时使用视觉和音频信息。 我们使用非盟和表达说明来培训模型,并应用一个序列模型来进一步提取视频框架之间的关联。 我们达到非盟的0.712分,在验证集上达到0.477分。 这些结果显示了我们在改进模型性能方面的做法的有效性。