Multimodal Stance Detection (MSD) is a crucial task for understanding public opinion on social media. Existing work simply fuses information from various modalities to learn stance representations, overlooking the varying contributions of stance expression from different modalities. Therefore, stance misunderstanding noises may be drawn into the stance learning process due to the risk of learning errors by rough modality combination. To address this, we get inspiration from the dual-process theory of human cognition and propose **ReMoD**, a framework that **Re**thinks **Mo**dality contribution of stance expression through a **D**ual-reasoning paradigm. ReMoD integrates *experience-driven intuitive reasoning* to capture initial stance cues with *deliberate reflective reasoning* to adjust for modality biases, refine stance judgments, and thereby dynamically weight modality contributions based on their actual expressive power for the target stance. Specifically, the intuitive stage queries the Modality Experience Pool (MEP) and Semantic Experience Pool (SEP) to form an initial stance hypothesis, prioritizing historically impactful modalities. This hypothesis is then refined in the reflective stage via two reasoning chains: Modality-CoT updates MEP with adaptive fusion strategies to amplify relevant modalities, while Semantic-CoT refines SEP with deeper contextual insights of stance semantics. These dual experience structures are continuously refined during training and recalled at inference to guide robust and context-aware stance decisions. Extensive experiments on the public MMSD benchmark demonstrate that our ReMoD significantly outperforms most baseline models and exhibits strong generalization capabilities.
翻译:多模态立场检测(MSD)是理解社交媒体公众意见的关键任务。现有工作通常简单融合来自不同模态的信息以学习立场表征,忽视了不同模态在立场表达中贡献度的差异。因此,由于粗糙模态组合可能导致学习误差的风险,立场误解噪声可能被引入立场学习过程。为解决这一问题,我们从人类认知的双过程理论中获得启发,提出了**ReMoD**框架,该框架通过**双重推理**范式**重新思考**立场表达的**模态贡献**。ReMoD整合了*经验驱动的直觉推理*以捕捉初始立场线索,并通过*审慎的反思推理*调整模态偏差、优化立场判断,从而根据各模态对目标立场的实际表达能力动态加权其贡献。具体而言,直觉阶段查询模态经验池(MEP)和语义经验池(SEP)以形成初始立场假设,优先考虑历史影响显著的模态。该假设随后在反思阶段通过两条推理链进行优化:Modality-CoT通过自适应融合策略更新MEP以增强相关模态,而Semantic-CoT则结合立场语义的深层上下文洞察优化SEP。这两种经验结构在训练过程中持续优化,并在推理时被调用以指导鲁棒且上下文感知的立场决策。在公开的MMSD基准上的大量实验表明,我们的ReMoD显著优于大多数基线模型,并展现出强大的泛化能力。