Facial expression recognition (FER) is a challenging topic in artificial intelligence. Recently, many researchers have attempted to introduce Vision Transformer (ViT) to the FER task. However, ViT cannot fully utilize emotional features extracted from raw images and requires a lot of computing resources. To overcome these problems, we propose a quaternion orthogonal transformer (QOT) for FER. Firstly, to reduce redundancy among features extracted from pre-trained ResNet-50, we use the orthogonal loss to decompose and compact these features into three sets of orthogonal sub-features. Secondly, three orthogonal sub-features are integrated into a quaternion matrix, which maintains the correlations between different orthogonal components. Finally, we develop a quaternion vision transformer (Q-ViT) for feature classification. The Q-ViT adopts quaternion operations instead of the original operations in ViT, which improves the final accuracies with fewer parameters. Experimental results on three in-the-wild FER datasets show that the proposed QOT outperforms several state-of-the-art models and reduces the computations.
翻译:在人工智能中, 畸形表达识别( FER) 是一个具有挑战性的话题。 最近, 许多研究人员试图将视野变异器( VIT) 引入 FER 任务。 然而, ViT 无法充分利用从原始图像中提取的情感特征, 并且需要大量计算资源。 为了克服这些问题, 我们为 FER 提议了一个四硝基或硫基变异器( QOT ) 。 首先, 为了减少从预先培训的ResNet- 50 中提取的特征的冗余, 我们使用正方位损失来分解这些特征, 将这些特征压缩成三组正方位次功能。 其次, 三个或方位次特性被整合到一个四元矩阵中, 该矩阵维护不同或方位组件之间的关联性。 最后, 我们开发了一个用于特征分类的四硝基离子变异变异器( Q- ViT ) 。 Q- ViT 采用重现操作, 而不是在 ViT 最初的操作中, 以较少的参数来改进最终的缩缩图。 。 在三组FER 数据集中的实验结果显示, 的模型将降低数模型。</s>