Prior work has shown that the order in which different components of the face are learned using a sequential learner can play an important role in the performance of facial expression recognition systems. We propose FaceTopoNet, an end-to-end deep model for facial expression recognition, which is capable of learning an effective tree topology of the face. Our model then traverses the learned tree to generate a sequence, which is then used to form an embedding to feed a sequential learner. The devised model adopts one stream for learning structure and one stream for learning texture. The structure stream focuses on the positions of the facial landmarks, while the main focus of the texture stream is on the patches around the landmarks to learn textural information. We then fuse the outputs of the two streams by utilizing an effective attention-based fusion strategy. We perform extensive experiments on four large-scale in-the-wild facial expression datasets - namely AffectNet, FER2013, ExpW, and RAF-DB - and one lab-controlled dataset (CK+) to evaluate our approach. FaceTopoNet achieves state-of-the-art performance on three of the five datasets and obtains competitive results on the other two datasets. We also perform rigorous ablation and sensitivity experiments to evaluate the impact of different components and parameters in our model. Lastly, we perform robustness experiments and demonstrate that FaceTopoNet is more robust against occlusions in comparison to other leading methods in the area.
翻译:先前的工作显示, 使用相继学习者学习面部不同组成部分的顺序可以在面部表达识别系统的表现中发挥重要作用。 我们提出FaceTopoNet, 是一个面部表达识别的端到端深模型, 能够学习有效的树型表情。 我们的模型随后对学习过的树进行翻转, 以生成一个序列, 然后用来组成一个嵌入序列。 设计模型为学习结构采用一流, 一个流用于学习纹理。 结构流以面部标志的位置为重点, 而纹理流的主要焦点则是在标志周围的补丁以学习纹理信息。 我们然后利用有效的基于注意的聚合战略, 将两个流的输出结合起来。 我们在四大比例化的面部表达数据集中进行广泛的实验, 即 AffectNet、 FER2013、 ExplaW 和 RAF- DB - 以及一个实验室控制的参数集( CK+) 来评估我们的方法。 FaceToppeNet 将其它的精确度和竞争性的精确性实验中, 我们用另外两种方法来评估我们的数据。