Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art results on both the datasets.
翻译:在现实世界的相互作用中,面部表达方式通常比较微妙,以时间方式演变,需要非盟的探测模型学习空间和时间信息。在本文件中,我们侧重于空间和空间时空特征,将面部激活的时间演进编码。为此,我们提议行动单位生命周期软件库网络(AULA-Caps)利用框架和顺序层面的特征对非盟进行检测。在框架层面,AULA-Caps的胶囊层学习空间特征原始特征,以确定非盟在顺序层面的启动,通过侧重于相关空间时空段,来了解毗连框架之间的时间依赖性。所学的地貌胶囊相互配合,以便模型根据非盟生命周期,选择更多关注空间或空间周期-时空信息。拟议模型将评估常用的BP4D和GFT基准数据集的结果。