Automatic facial action unit (AU) recognition is a challenging task due to the scarcity of manual annotations. To alleviate this problem, a large amount of efforts has been dedicated to exploiting various methods which leverage numerous unlabeled data. However, many aspects with regard to some unique properties of AUs, such as the regional and relational characteristics, are not sufficiently explored in previous works. Motivated by this, we take the AU properties into consideration and propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance in a self-supervised manner via the unlabeled data. Specifically, to enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches. Meanwhile, a single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles and encode the motion information into the global feature representation. Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network with the proposed regional and temporal based auxiliary task learning (RTATL) framework. Extensive experiments on BP4D and DISFA demonstrate the superiority of our method and new state-of-the-art performances are achieved.
翻译:由于缺少手动说明,自动面部行动股(AU)的识别是一项艰巨的任务。为了缓解这一问题,我们已作出大量努力,利用各种方法,利用许多未贴标签的数据。然而,以前的工作没有充分探讨非盟某些独特特性的许多方面,例如区域和关系特征。为此,我们考虑到非盟的属性,并提议两项与非盟有关的辅助任务,以通过未贴标签的数据,以自我监督的方式,缩小有限的说明与模型性能之间的差距。具体地说,为了进一步区分与非盟关系嵌入的区域特征,我们设计了RoI油漆任务,以恢复随机割下的AU部分。与此同时,提议采用单一的光学流估计任务,以利用面部肌肉的动态变化,将运动信息编码到全球特征表中。根据这两项自行控制的辅助任务、地方特征、非盟的相互关系和运动提示,在骨干网络中更好地捕捉到与拟议的区域和时间辅助任务学习框架(RTATL)有关的区域特征,我们设计了RoI Inprint 任务,以恢复随机裁剪裁剪裁断的A部分。同时,还提议了一个单一的光流估测图,将运动信息纳入全球特征说明。在BPPPASTASTAFAFA的高级框架上已经实现了。