Facial expressions are the most common universal forms of body language. In the past few years, automatic facial expression recognition (FER) has been an active field of research. However, it is still a challenging task due to different uncertainties and complications. Nevertheless, efficiency and performance are yet essential aspects for building robust systems. In this work, we propose two models named EmoXNet and EmoXNetLite. EmoXNet is an ensemble learning technique for learning convoluted facial representations, whereas EmoXNetLite is a distillation technique for transferring the knowledge from our ensemble model to an efficient deep neural network using label-smoothen soft labels to detect expressions effectively in real-time. Both models attained better accuracy level in comparison to the models reported to date. The ensemble model (EmoXNet) attained 85.07% test accuracy on FER-2013 with FER+ annotations and 86.25% test accuracy on Real-world Affective Faces Database (RAF-DB). Whereas, the distilled model (EmoXNetLite) attained 82.07% test accuracy on FER-2013 with FER+ annotations and 81.78% test accuracy on RAF-DB. Results show that our models seem to generalize well on new data and are learned to focus on relevant facial representations for expressions recognition.
翻译:在过去几年中,自动面部表达识别(FER)是一个积极的研究领域。然而,由于不同的不确定性和复杂性,这仍然是一个具有挑战性的任务。然而,效率和性能仍然是建设强健系统的基本方面。在这项工作中,我们提出了两个模型,分别名为EmoXNet和EmoXNetLite。EmoXNet是一种学习复杂面部表征的混合学习技术,而EmoXNetLite是一种蒸馏技术,将知识从我们的混合模型转移到高效的深神经网络,使用标签-Smousethen软标签实时有效地检测表达方式。与迄今报告的模型相比,这两个模型都达到了更高的精确度。在FER-2013上,共同模型(EmoXNet)达到了85.07%的测试精度,配有FER+说明和Real-world Affective Face Face数据库(RAF-DB)86.25%的测试精度。而精度模型(EmoXNetLite)已经达到82.07%测试TRAFER-DRAD 的精确度,并显示我们GRADR-CRADRADRADRADRA的精度。