Face based affective computing consists in detecting emotions from face images. It is useful to unlock better automatic comprehension of human behaviours and could pave the way toward improved human-machines interactions. However it comes with the challenging task of designing a computational representation of emotions. So far, emotions have been represented either continuously in the 2D Valence/Arousal space or in a discrete manner with Ekman's 7 basic emotions. Alternatively, Ekman's Facial Action Unit (AU) system have also been used to caracterize emotions using a codebook of unitary muscular activations. ABAW3 and ABAW4 Multi-Task Challenges are the first work to provide a large scale database annotated with those three types of labels. In this paper we present a transformer based multi-task method for jointly learning to predict valence arousal, action units and basic emotions. From an architectural standpoint our method uses a taskwise token approach to efficiently model the similarities between the tasks. From a learning point of view we use an uncertainty weighted loss for modelling the difference of stochasticity between the three tasks annotations.
翻译:基于面部的感官计算包括从脸部图像中检测情感。 它有助于打开对人类行为的更好自动理解,并为改进人体机器互动铺平道路。 但是,它与设计一种计算式情感代表的艰巨任务相伴而生。 到目前为止,情感在 2D Valence / Arousal 空间中一直存在,或者以与 Ekman 的 7 种基本情感的离散方式呈现。 或者, Ekman 的 Facial Action Unit (AU) 系统也被用于使用单一肌肉激活代码手册对情感进行切除。 ABAW3 和 ABAW4 多重任务挑战是第一个提供大型数据库,用这三种类型的标签附加注释的工作。 在本文中,我们展示了一种基于变压器的多任务方法,以共同学习预测价值、动作单位和基本情感。 从建筑观点看,我们的方法使用了一种任务象征方法来有效地模拟任务之间的相似性。 从学习角度看,我们使用一种不确定性加权损失来模拟三个任务说明之间的差异。