Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction. To this end, we propose a novel fusion framework in which we first learn latent distributions over audiovisual temporal context vectors separately, and then constrain the variance vectors of unimodal latent distributions so that they represent the amount of information each modality provides w.r.t. emotion recognition. In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions may differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across the modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. In both classification and regression settings, we compare our uncertainty-aware fusion model with standard model-agnostic fusion baselines. Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.
翻译:自动识别来自面和声音的表面情感是很难做到的,部分是因为各种不确定性的来源,包括输入数据和机器学习框架中使用的标签中的不确定性。本文件引入了一种有不确定性的视听聚合方法,将模式上不确定性的不确定性量化为情绪预测。为此,我们提出一个新的聚合框架,我们首先在其中学习视听时间背景矢量的潜在分布,然后限制单方式潜伏分布的不同矢量,以便代表每种模式的信息数量,从而提供情调识别。特别是,我们对视听潜力分布的差异矢量进行校准和奥丁定级限制。当清晰校准、方式上不确定性分数显示其相应的预测可能与地面真实标签有多大差异时。由于等级分数的不确定性分数,可以将不同模式上的差异分布与损失数量相对应。在分类和回归设置中,我们可以比较我们的不确定性模型与标准模型-感知性潜伏分布基线的调校正。在20种感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感知-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感-感-感官-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-感-