Multimodal Emotion Recognition in Conversation (MERC) aims to enhance emotion understanding by integrating complementary cues from text, audio, and visual modalities. Existing MERC approaches predominantly focus on cross-modal shared features, often overlooking modality-specific features that capture subtle yet critical emotional cues such as micro-expressions, prosodic variations, and sarcasm. Although related work in multimodal emotion recognition (MER) has explored disentangling shared and modality-specific features, these methods typically employ rigid orthogonal constraints to achieve full disentanglement, which neglects the inherent complementarity between feature types and may limit recognition performance. To address these challenges, we propose Angle-Optimized Feature Learning (AO-FL), a framework tailored for MERC that achieves partial disentanglement of shared and specific features within each modality through adaptive angular optimization. Specifically, AO-FL aligns shared features across modalities to ensure semantic consistency, and within each modality it adaptively models the angular relationship between its shared and modality-specific features to preserve both distinctiveness and complementarity. An orthogonal projection refinement further removes redundancy in specific features and enriches shared features with contextual information, yielding more discriminative multimodal representations. Extensive experiments confirm the effectiveness of AO-FL for MERC, demonstrating superior performance over state-of-the-art approaches. Moreover, AO-FL can be seamlessly integrated with various unimodal feature extractors and extended to other multimodal fusion tasks, such as MER, thereby highlighting its strong generalization beyond MERC.
翻译:对话中的多模态情感识别旨在通过整合文本、音频和视觉模态的互补线索来增强情感理解。现有的MERC方法主要关注跨模态共享特征,往往忽略了捕捉细微但关键情感线索(如微表情、韵律变化和讽刺)的模态特定特征。尽管多模态情感识别领域的相关研究已探索了解缠共享特征与模态特定特征,但这些方法通常采用严格的正交约束来实现完全解缠,这忽视了特征类型之间固有的互补性,并可能限制识别性能。为应对这些挑战,我们提出了角度优化特征学习框架,这是一种专为MERC设计的框架,通过自适应角度优化实现每个模态内共享特征与特定特征的部分解缠。具体而言,AO-FL跨模态对齐共享特征以确保语义一致性,并在每个模态内自适应地建模其共享特征与模态特定特征之间的角度关系,以保持区分性和互补性。正交投影细化进一步消除了特定特征中的冗余,并用上下文信息丰富了共享特征,从而产生更具判别力的多模态表示。大量实验证实了AO-FL在MERC中的有效性,其性能优于现有先进方法。此外,AO-FL可无缝集成多种单模态特征提取器,并扩展至其他多模态融合任务(如MER),从而凸显了其在MERC之外的强大泛化能力。