Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.
翻译:从心理角度讲,情感是短期内影响或情感的表达,而情感是形成和保持的,但是,大多数现有的工作是分别研究情感和情感,没有充分利用两者背后的互补知识。在本文中,我们提出了一个多式情感知识分享框架(UniMSE),将管理事务和研究中心的任务与特征、标签和模型统一起来。我们在合成和语义层面进行模式融合,并在模式和样本之间引入对比性学习,以更好地捕捉情感和情感之间的差异和一致性。关于四个公共基准数据集(MOSI、MOSEI、MELD和IEMOCAP)的实验展示了拟议方法的有效性,并实现了与最新方法的一致改进。