Temporal context is key to the recognition of expressions of emotion. Existing methods, that rely on recurrent or self-attention models to enforce temporal consistency, work on the feature level, ignoring the task-specific temporal dependencies, and fail to model context uncertainty. To alleviate these issues, we build upon the framework of Neural Processes to propose a method for apparent emotion recognition with three key novel components: (a) probabilistic contextual representation with a global latent variable model; (b) temporal context modelling using task-specific predictions in addition to features; and (c) smart temporal context selection. We validate our approach on four databases, two for Valence and Arousal estimation (SEWA and AffWild2), and two for Action Unit intensity estimation (DISFA and BP4D). Results show a consistent improvement over a series of strong baselines as well as over state-of-the-art methods.
翻译:现有方法依靠经常性或自我注意模型来强制实施时间一致性,在地物层面开展工作,忽视特定任务的时间依赖性,未能模拟背景不确定性。为了缓解这些问题,我们利用神经过程框架提出一种方法,以明确情感识别,其中有三个新的关键组成部分:(a) 具有全球潜伏变量模型的概率性背景代表;(b) 除特征外,利用特定任务预测来模拟时间背景;(c) 智能时间背景选择。我们验证了我们四个数据库的方法,两个数据库用于Valence和Aff Wird2, 两个数据库用于行动股强度估计(DISFA和BP4D)。结果显示,在一系列强有力的基线和最新方法方面不断改进。