We propose an end-to-end architecture for facial expression recognition. Our model learns an optimal tree topology for facial landmarks, whose traversal generates a sequence from which we obtain an embedding to feed a sequential learner. The proposed architecture incorporates two main streams, one focusing on landmark positions to learn the structure of the face, while the other focuses on patches around the landmarks to learn texture information. Each stream is followed by an attention mechanism and the outputs are fed to a two-stream fusion component to perform the final classification. We conduct extensive experiments on two large-scale publicly available facial expression datasets, AffectNet and FER2013, to evaluate the efficacy of our approach. Our method outperforms other solutions in the area and sets new state-of-the-art expression recognition rates on these datasets.
翻译:我们为面部表达识别建议了一个端到端的结构。 我们的模型为面部标志学习了一种最佳树型结构, 其跨行产生一个序列, 我们从中获得嵌入以喂养一个相继学习者。 拟议的结构包含两个主要流流, 一个侧重于里程碑位置以学习脸部结构, 另一个侧重于标志周围的补丁以学习纹理信息。 每条流都有一个关注机制, 产出被反馈到一个双流融合组件中, 以完成最后分类 。 我们对两个大规模公开提供的面部表达数据集( AffectNet 和 FER2013) 进行了广泛的实验, 以评价我们方法的功效。 我们的方法在该地区超越了其他解决方案, 并在这些数据集上设定了新的状态表达识别率 。