In this paper, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins. The necessity to fine-tune these networks to predict facial expressions is highlighted. Several models are presented based on MobileNet, EfficientNet and RexNet architectures. It was experimentally demonstrated that they lead to near state-of-the-art results in age, gender and race recognition on the UTKFace dataset and emotion classification on the AffectNet dataset. Moreover, it is shown that the usage of the trained models as feature extractors of facial regions in video frames leads to 4.5% higher accuracy than the previously known state-of-the-art single models for the AFEW and the VGAF datasets from the EmotiW challenges. The models and source code are publicly available at https://github.com/HSE-asavchenko/face-emotion-recognition.
翻译:在本文中,对轻量级神经网络的多任务学习进行了研究,以便面部特征(年龄、性别、族裔)的面部识别和分类,对面部特征进行了没有边际的训练;强调必须对这些网络进行微调,以预测面部表情;根据移动网络、高效Net和RexNet结构,提出了几种模型;实验性地证明,这些模型在UTKFace数据集和AffectNet数据集的情感分类上,在年龄、性别和种族方面接近最先进的结果;此外,还表明,在视频框中使用经过训练的模型作为面部区域的特征提取器,其精确度比以前已知的AFEW和EmotiW挑战中的VGAF数据集的先进单一模型和VGAF数据集高出4.5%;模型和源代码可在https://github.com/HSE-asavchenko/face-emotion-devication上公开查阅。