In this paper, at first, the impact of ImageNet pre-training on Facial Expression Recognition (FER) was tested under different augmentation levels. It could be seen from the results that training from scratch could reach better performance compared to ImageNet fine-tuning at stronger augmentation levels. After that, a framework was proposed for standard Supervised Learning (SL), called Hybrid Learning (HL) which used Self-Supervised co-training with SL in Multi-Task Learning (MTL) manner. Leveraging Self-Supervised Learning (SSL) could gain additional information from input data like spatial information from faces which helped the main SL task. It is been investigated how this method could be used for FER problems with self-supervised pre-tasks such as Jigsaw puzzling and in-painting. The supervised head (SH) was helped by these two methods to lower the error rate under different augmentations and low data regime in the same training settings. The state-of-the-art was reached on AffectNet via two completely different HL methods, without utilizing additional datasets. Moreover, HL's effect was shown on two different facial-related problem, head poses estimation and gender recognition, which concluded to reduce in error rate by up to 9% and 1% respectively. Also, we saw that the HL methods prevented the model from reaching overfitting.
翻译:在本文中,首先,图像网预培训对偏差表现识别(FER)的影响是在不同的增强级别下测试的。从以下结果可以看出,从零到零的培训与在更强的增强级别上图像网微调相比,效果会更好。之后,为标准监督学习(SL)提出了一个框架,称为混合学习(HL),它使用多任务学习(MTL)方式与SL进行自我监督共同培训。利用自我抽样学习(SSL),学习(SSL)可以从空间信息等输入数据获得更多信息,如面部空间信息,从而帮助执行主要SL任务。调查了如何用这种方法解决自我监督的预设任务(如Jigsaw puzzling 和 Inpainting)。这两个方法帮助了受监督的头部(SHHH)降低不同增量和低数据模式下的错误率。通过两种完全不同的HL方法,通过两种完全不同的HL方法,通过两种完全不同的 HL方法在达到 HL 上到达了与H 。此外,HL 也减少了与 HL 的性别相关的估计。