As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This paper introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1\% of the AU training set, strongly proving its robustness and generalization performance.
翻译:面部行动股(FAU)的分析(例如,检测和强度估计)是一个精细和局部表现行为的计量,记录了面部行动股(FAU)的分析(例如,检测和强度估计),其耗费时间、劳力密集和容易出错的注释,因此,FAU分析的长期挑战产生于缺乏人工说明的数据,这在很大程度上限制了受过训练的模型的普及能力。以前的大量工作努力通过半/微弱监督的方法和额外的辅助信息来缓解这一问题。然而,这些方法仍需要域知识,尚未避免对数据说明的高度依赖。本文介绍了一个强有力的面部代表模型:MAE-Face用于非盟分析。使用蒙蔽自动编码作为自我监督的训练前方法,MAE-Face首先从可行的面部图像收集中学习一个高容量模型,而没有额外的数据说明。在对非盟数据集进行微调后,MAE-Face Face展示了令人信服的绩效估计,在几乎所有评价结果上都达到新的状态。进一步的调查显示MAE-AVES全面培训时,其业绩得到了强有力的调整。