Facial Expression Recognition is a commercially-important application, but one under-appreciated limitation is that such applications require making predictions on out-of-sample distributions, where target images have different properties from the images the model was trained on. How well -- or how badly -- do facial expression recognition models do on unseen target domains? We provide a systematic and critical evaluation of transfer learning -- specifically, domain generalization -- in facial expression recognition. Using a state-of-the-art model with twelve datasets (six collected in-lab and six ``in-the-wild"), we conduct extensive round-robin-style experiments to evaluate classification accuracies when given new data from an unseen dataset. We also perform multi-source experiments to examine a model's ability to generalize from multiple source datasets, including (i) within-setting (e.g., lab to lab), (ii) cross-setting (e.g., in-the-wild to lab), and (iii) leave-one-out settings. Finally, we compare our results with three commercially-available software. We find sobering results: the accuracy of single- and multi-source domain generalization is only modest. Even for the best-performing multi-source settings, we observe average classification accuracies of 65.6% (range: 34.6%-88.6%; chance: 14.3%), corresponding to an average drop of 10.8 percentage points from the within-corpus classification performance (mean: 76.4%). We discuss the need for regular, systematic investigations into the generalizability of affective computing models and applications.
翻译:显性表达度识别是一个具有商业重要性的应用,但一个认识不足的限制是,这种应用需要预测标的图像与模型所培训的图像的不同性能。 面部表达度识别模型在隐性目标域上的效果如何 -- -- 或有多差 -- -- 面部表达度识别模型是如何做的? 我们对面部表达度识别中的转移学习 -- -- 具体而言,领域一般化 -- -- 进行系统而严格的评价。 使用12个数据集的最新模型(6个在实验室内收集,6个在网上收集 ),我们进行广泛的圆盘式实验,以评价从模型所培训的图像中得出的分类属性。 我们还进行多源实验,以检查模型是否有能力从多种源数据集(包括(一)内部设置(例如,实验室的实验室实验室)中进行概括化,(二)交叉设定(例如,在实验室内收集,在实验室内收集),以及(三)在离校的分类中进行。 最后,我们将我们的结果与3个从未知数据集获得的新数据时的周期性分析结果进行分类。 我们发现一个正常的版本的版本: 平均基础数据序列中,只有10.6.6 和最正常的正常的系统化。</s>