Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused on multimodal fusion strategies, and the deep study of modal representation learning was given less attention. Recently, contrastive learning has been confirmed effective at endowing the learned representation with stronger discriminate ability. Inspired by this, we explore the improvement approaches of modality representation with contrastive learning in this study. To this end, we devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives. At the first stage, for the improvement of unimodal representations, we employ the supervised contrastive learning to pull samples within the same class together while the other samples are pushed apart. At the second stage, a self-supervised contrastive learning is designed for the improvement of the distilled unimodal representations after cross-modal interaction. At last, we leverage again the supervised contrastive learning to enhance the fused multimodal representation. After all the contrast trainings, we next achieve the classification task based on frozen representations. We conduct experiments on three open datasets, and results show the advance of our model.
翻译:模式代表性学习是多式联运情绪分析(MSA)的一个重要问题,因为高度区别的表述方式有助于改进分析效果。以往的MSA工作通常侧重于多式联运融合战略,对模式代表性学习的深入研究关注较少。最近,对比性学习被证实在以更强的歧视能力赋予学习代表性方面是有效的。受此启发,我们探索了模式代表性的改进方法,并在此研究中进行了对比性学习。为此,我们设计了一个三阶段框架,以多视角对比性学习方式改进特定目标的表述方式。在第一阶段,为了改进单一模式的表述方式,我们利用监督对比性学习方式将样本拉到同一类的样本中,而其他样本则被推开。在第二阶段,我们设计了一种自我监督的对比性学习方式,以改进在交叉模式互动之后的蒸馏式非模式。最后,我们再次利用监督的对比性学习方式来提高组合式多式联运的表述方式。在进行了所有对比性培训之后,我们接下来将完成基于冻结的表述方式的分类任务。我们进行了三个开放数据模型的实验,并展示了结果。