Due to its importance in facial behaviour analysis, facial action unit (AU) detection has attracted increasing attention from the research community. Leveraging the online knowledge distillation framework, we propose the ``FANTrans" method for AU detection. Our model consists of a hybrid network of convolution and transformer blocks to learn per-AU features and to model AU co-occurrences. The model uses a pre-trained face alignment network as the feature extractor. After further transformation by a small learnable add-on convolutional subnet, the per-AU features are fed into transformer blocks to enhance their representation. As multiple AUs often appear together, we propose a learnable attention drop mechanism in the transformer block to learn the correlation between the features for different AUs. We also design a classifier that predicts AU presence by considering all AUs' features, to explicitly capture label dependencies. Finally, we make the attempt of adapting online knowledge distillation in the training stage for this task, further improving the model's performance. Experiments on the BP4D and DISFA datasets demonstrating the effectiveness of proposed method.
翻译:由于面部动作股(AU)在面部行为分析中的重要性,面部动作股(AU)的检测吸引了研究界越来越多的关注。利用在线知识蒸馏框架,我们提议“FANTrans”方法用于AU的检测。我们的模型包括一个混合的变异和变压区块网络,以学习每个AU的特征,并模拟AU的共发事件。模型使用预先训练的面部调整网络作为特征提取器。在通过一个小的可学习附加进化子网进行进一步改造后,每个AU的功能被装入变压器块,以加强其代表性。随着多个AU的出现,我们提议在变压区建立一个可学习的减少关注机制,以了解不同AU的特征之间的相互关系。我们还设计了一个分类器,通过考虑AU的所有特征来预测AU的存在,以明确捕捉标签依赖性。最后,我们尝试在培训阶段调整在线知识的蒸馏,进一步改进模型的性能。在BP4D和DISFA数据集上进行实验,以展示拟议方法的有效性。