End-to-end convolution representation learning has been proved to be very effective in facial action unit (AU) detection. Considering the co-occurrence and mutual exclusion between facial AUs, in this paper, we propose convolution neural networks with Local Region Relation Learning (LoRRaL), which can combine latent relationships among AUs for an end-to-end approach to facial AU occurrence detection. LoRRaL consists of 1) use bi-directional long short-term memory (BiLSTM) to dynamically and sequentially encode local AU feature maps, 2) use self-attention mechanism to dynamically compute correspondences from local facial regions and to re-aggregate AU feature maps considering AU co-occurrences and mutual exclusions, 3) use a continuous-state modern Hopfield network to encode and map local facial features to more discriminative AU feature maps, that all these networks take the facial image as input and map it to AU occurrences. Our experiments on the challenging BP4D and DISFA Benchmarks without any external data or pre-trained models results in F1-scores of 63.5% and 61.4% respectively, which shows our proposed networks can lead to performance improvement on the AU detection task.
翻译:在面部行动股(AU)检测中,端到端代表学习已证明非常有效,在面部行动股(AU)检测中,事实证明端到端代表学习非常有效。考虑到面部AU之间的共同发生和相互排斥,我们在本文件中提议与地方区域关系学习(LORRAL)建立进化神经网络(LORRAL),这种网络可以将非盟之间的潜在关系结合起来,以便用端到端到端的面部发现方法。LORRAL包括1:1)使用双向长期短期记忆(BILSTM)来动态和顺序编码当地的非盟地貌地图(BILSTM),2)使用自我注意机制来动态地计算来自当地面部的通信,并在考虑到非盟共同发生和相互排斥的情况下重新汇总非盟地貌地图。3)使用连续状态的现代Hopfield网络来编码和绘制地方面部特征图,以更具有歧视性的非盟地貌地图,所有这些网络都把面部图像作为输入和图像映射到非盟的发生地点。我们关于具有挑战性的BP4D和DIS基准的实验没有外部数据或事先训练过的模型,可以在F1核心的63.5%的检测中分别显示我们的任务业绩。