Over the past few years, best SSL methods, gradually moved from the pre-text task learning to the Contrastive learning. But contrastive methods have some drawbacks which could not be solved completely, such as performing poor on fine-grained visual tasks compare to supervised learning methods. In this study, at first, the impact of ImageNet pre-training on fine-grained Facial Expression Recognition (FER) was tested. It could be seen from the results that training from scratch is better than ImageNet fine-tuning at stronger augmentation levels. After that, a framework was proposed for standard Supervised Learning (SL), called Hybrid Multi-Task Learning (HMTL) which merged Self-Supervised as auxiliary task to the SL training setting. Leveraging Self-Supervised Learning (SSL) can gain additional information from input data than labels which can help the main fine-grained SL task. It is been investigated how this method could be used for FER by designing two customized version of common pre-text techniques, Jigsaw puzzling and in-painting. The state-of-the-art was reached on AffectNet via two types of HMTL, without utilizing pre-training on additional datasets. Moreover, we showed the difference between SS pre-training and HMTL to demonstrate superiority of proposed method. Furthermore, the impact of proposed method was shown on two other fine-grained facial tasks, Head Poses estimation and Gender Recognition, which concluded to reduce in error rate by 11% and 1% respectively.
翻译:过去几年来,最好的 SSL 方法, 逐渐从文字前的任务学习逐渐转向对比学习。 但对比性方法有一些无法完全解决的缺点。 例如, 与监管的学习方法相比, 在精细的视觉任务上表现较差。 在这项研究中, 最初, 图像Net 预培训对细微重度表现识别( FER) 的影响已经测试了。 从结果中可以看出, 从零到零的培训比在更强的增强水平上微调图像网络。 之后, 提出了一个标准监督学习( SL) 框架, 称为混合多任务学习( HMTL), 将自我超重的视觉任务合并为 SL 培训设置的辅助任务。 从输入数据中可以获得更多的信息, 而不是有助于精细微度表达的 SL任务。 通过设计两种定制的文本前技术, Jigsaw puzzling 和 混合多功能学习 (HMT), 将自我超重的性别学习(HMT) 合并为辅助任务。 将自定义的自我超重 和预变压方法分别展示了 HMTF 。 。