Facial expression recognition (FER) algorithms work well in constrained environments with little or no occlusion of the face. However, real-world face occlusion is prevalent, most notably with the need to use a face mask in the current Covid-19 scenario. While there are works on the problem of occlusion in FER, little has been done before on the particular face mask scenario. Moreover, the few works in this area largely use synthetically created masked FER datasets. Motivated by these challenges posed by the pandemic to FER, we present a novel dataset, the Masked Student Dataset of Expressions or MSD-E, consisting of 1,960 real-world non-masked and masked facial expression images collected from 142 individuals. Along with the issue of obfuscated facial features, we illustrate how other subtler issues in masked FER are represented in our dataset. We then provide baseline results using ResNet-18, finding that its performance dips in the non-masked case when trained for FER in the presence of masks. To tackle this, we test two training paradigms: contrastive learning and knowledge distillation, and find that they increase the model's performance in the masked scenario while maintaining its non-masked performance. We further visualise our results using t-SNE plots and Grad-CAM, demonstrating that these paradigms capitalise on the limited features available in the masked scenario. Finally, we benchmark SOTA methods on MSD-E.
翻译:面部表情识别(FER)算法在面部没有或很少覆盖的受限环境中表现良好。然而,面部遮挡在现实世界中很普遍,尤其是在当前的Covid-19场景中需要使用口罩。虽然已经有关于FER中遮挡问题的研究,但在特定的口罩场景中,此类研究还很少。此外,在这个领域中的少数工作主要使用合成的遮挡FER数据集。受疫情对FER的挑战所激励,我们提出了一个新的数据集:口罩下的学生表情数据集(MSD-E),包括从142个个体中收集的1,960张真实的非口罩和口罩面部表情图像。除了面部特征被遮挡的问题之外,我们还演示了口罩FER中其他更微妙的问题在我们的数据集中的代表性。然后,我们使用ResNet-18提供了基准结果,发现当在存在口罩的FER中进行训练时,其性能会下降。为了解决这个问题,我们测试了两种训练范式:对比学习和知识蒸馏,并发现它们可以在保持非口罩性能的情况下提高模型在口罩场景中的性能。我们使用t-SNE图和Grad-CAM对我们的结果进行可视化,证明这些范例可以利用口罩场景中有限的特征。最后,我们对MSD-E进行了SOTA方法的基准测试。