Micro-expressions are spontaneous, rapid and subtle facial movements that can neither be forged nor suppressed. They are very important nonverbal communication clues, but are transient and of low intensity thus difficult to recognize. Recently deep learning based methods have been developed for micro-expression (ME) recognition using feature extraction and fusion techniques, however, targeted feature learning and efficient feature fusion still lack further study according to the ME characteristics. To address these issues, we propose a novel framework Feature Representation Learning with adaptive Displacement Generation and Transformer fusion (FRL-DGT), in which a convolutional Displacement Generation Module (DGM) with self-supervised learning is used to extract dynamic features from onset/apex frames targeted to the subsequent ME recognition task, and a well-designed Transformer Fusion mechanism composed of three Transformer-based fusion modules (local, global fusions based on AU regions and full-face fusion) is applied to extract the multi-level informative features after DGM for the final ME prediction. The extensive experiments with solid leave-one-subject-out (LOSO) evaluation results have demonstrated the superiority of our proposed FRL-DGT to state-of-the-art methods.
翻译:微表情是自发、迅速和微小的面部运动,既不可伪造也不可抑制。它们是非常重要的非语言交流线索,但是它们是短暂的、弱的,因此很难识别。最近,基于深度学习的方法已经被开发用于微表情(ME)识别,使用特征提取和融合技术,然而针对ME特征的学习和有效的特征融合仍需要进一步研究。为了解决这些问题,我们提出了一种新的特征表示学习框架,自适应位移生成和Transformer融合(FRL-DGT),其中卷积位移生成模块(DGM)与自监督学习一起用于从起始/顶点帧中提取为随后的ME识别任务所定位的动态特征,设计精良的Transformer Fusion机制由三个基于AU区域的Transformer融合模块(局部、全局的融合)以及全局的融合)用于在DGM后提取多层信息特征进行最终的ME预测。广泛的实验与强大的留一子主题(LOSO)评估结果证明了我们提出的FRL-DGT优于此前的技术。