Facial Expression Recognition from static images is a challenging problem in computer vision applications. Convolutional Neural Network (CNN), the state-of-the-art method for various computer vision tasks, has had limited success in predicting expressions from faces having extreme poses, illumination, and occlusion conditions. To mitigate this issue, CNNs are often accompanied by techniques like transfer, multi-task, or ensemble learning that often provide high accuracy at the cost of increased computational complexity. In this work, we propose a Part-based Ensemble Transfer Learning network that models how humans recognize facial expressions by correlating the spatial orientation pattern of the facial features with a specific expression. It consists of 5 sub-networks, and each sub-network performs transfer learning from one of the five subsets of facial landmarks: eyebrows, eyes, nose, mouth, or jaw to expression classification. We show that our proposed ensemble network uses visual patterns emanating from facial muscles' motor movements to predict expressions and demonstrate the usefulness of transfer learning from Facial Landmark Localization to Facial Expression Recognition. We test the proposed network on the CK+, JAFFE, and SFEW datasets, and it outperforms the benchmark for CK+ and JAFFE datasets by 0.51% and 5.34%, respectively. Additionally, the proposed ensemble network consists of only 1.65M model parameters, ensuring computational efficiency during training and real-time deployment. Grad-CAM visualizations of our proposed ensemble highlight the complementary nature of its sub-networks, a key design parameter of an effective ensemble network. Lastly, cross-dataset evaluation results reveal that our proposed ensemble has a high generalization capacity, making it suitable for real-world usage.
翻译:静态图像中的偏向表达度识别是计算机视觉应用中一个具有挑战性的问题。 进化神经网络( CNN) 是一个基于部分的 Ensmble 转移学习网络, 它通过将各种计算机视觉任务的空间定向模式与特定表达方式联系起来来模拟人类的面部表情表达方式。 它由5个子网络组成, 每一个子网络都从五组面部标志中的一组进行转移学习: 眉毛、 眼睛、 鼻子、 口或下巴到表达分类。 我们表明,我们提议的混合网络使用源自面部肌肉模型一般动作的视觉模式来预测表达方式, 并展示从磁性地标本地化到亚洲化的转变表情表达方式。 我们用5个子网络来预测面部的面部方向方向, 将直观网络中的直径网络和直径流数据系统化。 我们用GSFAF1 来测试一个预选的直径网络, 并且由CSFEF1 和GFAF1 的直径网络组成一个高端数据。 我们用CSFAF1 和GFSFAF1 的直径数据库, 数据系统, 和GFAFSD1 数据系统, 数据系统, 和GIFSDD1 数据系统, 数据系统, 数据系统, 将一个高, 和JSFSFAF1 。