Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion. However, most existing models that are based on attention mechanisms have difficulty in learning emotionally relevant parts on their own. To solve this problem, we propose to incorporate external emotion-related knowledge in the co-attention based fusion of pre-trained models. To effectively incorporate this knowledge, we enhance the co-attention model with a Bayesian attention module (BAM) where a prior distribution is estimated using the emotion-related knowledge. Experimental results on the IEMOCAP dataset show that the proposed approach can outperform several state-of-the-art approaches by at least 0.7% unweighted accuracy (UA).
翻译:多模式情感认知是一个具有挑战性的研究领域,旨在整合不同模式以预测人类情感。然而,基于关注机制的大多数现有模式都难以单独学习情感相关部分。为了解决这个问题,我们提议将外部情感相关知识纳入基于共同关注的预培训模式的融合中。为了有效地纳入这一知识,我们用一个Bayesian关注模块(BAM)强化了共同关注模式,该模块先前使用情感相关知识估算了先前的分布。 IEMOCAP数据集的实验结果表明,拟议方法至少可以超过0.7%的非加权精确度(UA)。