This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition. The method is used for the Multi-Task Learning Challenge. Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face. We utilized the InceptionNet V3 model to extract deep features then we applied the attention mechanism to refine the features. After that, we put those features into the transformer block and multi-layer perceptron networks to get the final multiple kinds of emotion. Our model predicts arousal and valence, classifies the emotional expression and estimates the action units simultaneously. The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.
翻译:本文介绍了我们提交第四场 " 狂妄行为分析(ABAW) " 竞赛的方法。该方法用于多任务学习挑战。我们不只使用面部信息,而是使用所提供数据集中包含面部和面部背景的完整信息。我们使用 " 感知网V3 " 模型来提取深层特征,然后我们运用关注机制来改进特征。之后,我们将这些特征放入变压器块和多层透视器网络,以获得最后的多种情感。我们的模型预测了振奋性和价值,对情感表达进行了分类,同时估算了行动单位。拟议系统在MTL挑战验证数据集上实现了0.917的性能。