People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal's features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15\% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02\% with their FaceExpr model, which was trained on 43,000 images.
翻译:人类能够自然地识别不自然的人脸表情,比如在卡通片中描绘的不同脸型或者应用到动物特征上。但是当前的机器学习算法在面部表情识别的跨领域迁移方面仍然存在问题。本文提出了一种基于生物灵感的机制来实现这种迁移学习,该机制基于规范参考编码,其中模式是相对于领域特定的参考向量而编码的差向量。通过结合领域特定的参考帧,我们实现了在多个领域的高效迁移学习。我们的提出的架构提供了人类大脑如何在不同头型(人类、猴子和卡通形象)上自然识别面部表情的解释,而无需进行广泛的训练。规范参考编码还允许直接从神经单元的活动中读取表情的强度,类似于人脑中面部选择性神经元的功能。我们的模型在FERG数据集上实现了92.15\%的分类准确率,并且具有极高的数据效率。我们的机制仅使用12张图像进行训练,包括每个类别(面部表情)和每个领域(卡通人物)的单张图像。相比之下,FERG数据集的作者使用了43,000张图像进行训练,并在其FaceExpr模型上实现了89.02\%的分类准确率。