Facial emotion recognition (FER) is significant for human-computer interaction such as clinical practice and behavioral description. Accurate and robust FER by computer models remains challenging due to the heterogeneity of human faces and variations in images such as different facial pose and lighting. Among all techniques for FER, deep learning models, especially Convolutional Neural Networks (CNNs) have shown great potential due to their powerful automatic feature extraction and computational efficiency. In this work, we achieve the highest single-network classification accuracy on the FER2013 dataset. We adopt the VGGNet architecture, rigorously fine-tune its hyperparameters, and experiment with various optimization methods. To our best knowledge, our model achieves state-of-the-art single-network accuracy of 73.28 % on FER2013 without using extra training data.
翻译:在人类-计算机互动方面,如临床实践和行为描述方面,偏激情绪识别(FER)非常重要。计算机模型准确和稳健的FER仍然具有挑战性,因为人类面孔的异质性以及不同面容和照明等图像的变化。在所有FER技术中,深层学习模型,特别是进化神经网络(CNNs),由于其强大的自动特征提取和计算效率而显示出巨大的潜力。在这项工作中,我们在FER2013数据集上实现了最高的单一网络分类精确度。我们采用了VGGNet结构,严格精细调整其超光度,并试验了各种优化方法。据我们所知,我们的模型在FER2013上实现了73.28%的最先进的单一网络精确度,而没有使用额外的培训数据。