Existing studies on multimodal sentiment analysis heavily rely on textual modality and unavoidably induce the spurious correlations between textual words and sentiment labels. This greatly hinders the model generalization ability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sentiment analysis. This task aims to estimate and mitigate the bad effect of textual modality for strong OOD generalization. To this end, we embrace causal inference, which inspects the causal relationships via a causal graph. From the graph, we find that the spurious correlations are attributed to the direct effect of textual modality on the model prediction while the indirect one is more reliable by considering multimodal semantics. Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis, which captures the direct effect of textual modality via an extra text model and estimates the indirect one by a multimodal model. During the inference, we first estimate the direct effect by the counterfactual inference, and then subtract it from the total effect of all modalities to obtain the indirect effect for reliable prediction. Extensive experiments show the superior effectiveness and generalization ability of our proposed framework.
翻译:关于多式联运情绪分析的现有研究在很大程度上依赖文字模式,不可避免地引出文字和情绪标签之间的虚假关联。这严重阻碍了模型概括能力。为了解决这一问题,我们界定了分配外(OOOD)多式联运情绪分析的任务。这一任务旨在估计和减轻文本模式的坏影响,以大力OOD一般化。为此,我们采用因果推论,通过因果图来检查因果关系。从图中,我们发现这些虚假关联归因于文本模式对模型预测的直接效应,而间接关联则通过考虑多式联运语义学而更加可靠。受此启发,我们为多式联运情绪分析设计了一个模型 -- -- 意想不到的反事实框架,通过外文本模型捕捉文本模式的直接效应,通过多式联运模型估计间接效应。在推论中,我们首先估计反事实推论的直接效应,然后从所有模式的总体效果中减去它,以获得可靠预测的间接效果。广泛的实验显示我们拟议框架的超高效力和普遍能力。