Deep neural networks have been widely used for feature learning in facial expression recognition systems. However, small datasets and large intra-class variability can lead to overfitting. In this paper, we propose a method which learns an optimized compact network topology for real-time facial expression recognition utilizing localized facial landmark features. Our method employs a spatio-temporal bilinear layer as backbone to capture the motion of facial landmarks during the execution of a facial expression effectively. Besides, it takes advantage of Monte Carlo Dropout to capture the model's uncertainty which is of great importance to analyze and treat uncertain cases. The performance of our method is evaluated on three widely used datasets and it is comparable to that of video-based state-of-the-art methods while it has much less complexity.
翻译:深神经网络被广泛用于面部表情识别系统中的特征学习,然而,小型数据集和大型类内变异性可能导致过度适应。在本文中,我们提出了一个方法,用于学习利用局部面部标志性特征实时面部表情识别的优化压缩网络表层。我们的方法使用一个时空双线层作为主干线,以捕捉面部表情表达式有效执行过程中的面部标志运动。此外,它利用蒙特卡洛的流出来捕捉模型的不确定性,这对于分析和处理不确定案例非常重要。我们方法的性能用三种广泛使用的数据集进行评估,它与基于视频的最新方法相仿,而其复杂性则要小得多。