As various databases of facial expressions have been made accessible over the last few decades, the Facial Expression Recognition (FER) task has gotten a lot of interest. The multiple sources of the available databases raised several challenges for facial recognition task. These challenges are usually addressed by Convolution Neural Network (CNN) architectures. Different from CNN models, a Transformer model based on attention mechanism has been presented recently to address vision tasks. One of the major issue with Transformers is the need of a large data for training, while most FER databases are limited compared to other vision applications. Therefore, we propose in this paper to learn a vision Transformer jointly with a Squeeze and Excitation (SE) block for FER task. The proposed method is evaluated on different publicly available FER databases including CK+, JAFFE,RAF-DB and SFEW. Experiments demonstrate that our model outperforms state-of-the-art methods on CK+ and SFEW and achieves competitive results on JAFFE and RAF-DB.
翻译:由于过去几十年中各种面部表达式数据库的可访问性已经形成,因此,面部表现识别(FER)任务引起了许多兴趣,现有数据库的多种来源为面部识别任务提出了若干挑战,这些挑战通常由神经网络(CNN)结构处理。与CNN模型不同,最近提出了基于关注机制的变异模型,以完成视觉任务。变异器的主要问题之一是需要大量的培训数据,而大部分FER数据库与其他视觉应用相比是有限的。因此,我们在本文件中提议,与FREF任务的挤压和Excucation(SE)块一起学习一个视觉变异器。拟议方法在各种公开的FER数据库(包括CK+、JAFFE、RAF-DB和SFEW)上进行评估。实验表明,我们的模型在CK+和SFEW方面超越了最新技术方法,在JAFFFFE和RA-DB上取得了竞争性的成果。