InterCLIP-MEP：用于多模态讽刺检测的交互式CLIP与记忆增强预测器 (InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection)

Sarcasm in social media, frequently conveyed through the interplay of text and images, presents significant challenges for sentiment analysis and intention mining. Existing multi-modal sarcasm detection approaches have been shown to excessively depend on superficial cues within the textual modality, exhibiting limited capability to accurately discern sarcasm through subtle text-image interactions. To address this limitation, a novel framework, InterCLIP-MEP, is proposed. This framework integrates Interactive CLIP (InterCLIP), which employs an efficient training strategy to derive enriched cross-modal representations by embedding inter-modal information directly into each encoder, while using approximately 20.6$\times$ fewer trainable parameters compared with existing state-of-the-art (SOTA) methods. Furthermore, a Memory-Enhanced Predictor (MEP) is introduced, featuring a dynamic dual-channel memory mechanism that captures and retains valuable knowledge from test samples during inference, serving as a non-parametric classifier to enhance sarcasm detection robustness. Extensive experiments on MMSD, MMSD2.0, and DocMSU show that InterCLIP-MEP achieves SOTA performance, specifically improving accuracy by 1.08% and F1 score by 1.51% on MMSD2.0. Under distributional shift evaluation, it attains 73.96% accuracy, exceeding its memory-free variant by nearly 10% and the previous SOTA by over 15%, demonstrating superior stability and adaptability. The implementation of InterCLIP-MEP is publicly available at https://github.com/CoderChen01/InterCLIP-MEP.

翻译：社交媒体中的讽刺常通过文本与图像的交互传达，这对情感分析和意图挖掘构成了重大挑战。现有的多模态讽刺检测方法已被证明过度依赖文本模态中的表面线索，在通过细微的文本-图像交互准确识别讽刺方面能力有限。为应对这一局限，本文提出了一种新颖框架InterCLIP-MEP。该框架集成了交互式CLIP（InterCLIP），其采用高效的训练策略，通过将跨模态信息直接嵌入每个编码器来获得丰富的跨模态表示，同时相比现有最先进方法减少了约20.6倍的可训练参数。此外，本文引入了记忆增强预测器（MEP），其采用动态双通道记忆机制，在推理过程中捕获并保留测试样本中的有价值知识，作为非参数分类器以增强讽刺检测的鲁棒性。在MMSD、MMSD2.0和DocMSU数据集上的大量实验表明，InterCLIP-MEP实现了最先进的性能，特别是在MMSD2.0上准确率提升了1.08%，F1分数提升了1.51%。在分布偏移评估下，其准确率达到73.96%，超过其无记忆变体近10%，并超越先前最先进方法超过15%，展现出卓越的稳定性和适应性。InterCLIP-MEP的实现代码已在https://github.com/CoderChen01/InterCLIP-MEP公开。