情感一致性推理：基于情感依据验证器的多模态大语言模型方法 (Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier)

The recent advancement of Multimodal Large Language Models (MLLMs) is transforming human-computer interaction (HCI) from surface-level exchanges into more nuanced and emotionally intelligent communication. To realize this shift, emotion understanding becomes essential allowing systems to capture subtle cues underlying user intent. Furthermore, providing faithful explanations for predicted emotions is crucial to ensure interpretability and build user trust. However, current MLLM-based methods often generate emotion explanations that diverge from the target labels and sometimes even contradict their own predicted emotions. This inconsistency poses a critical risk for misunderstanding and erodes reliability in interactive settings. To address this, we propose a novel approach: the Emotional Rationale Verifier (ERV) and an Explanation Reward. Our method guides the model to produce reasoning that is explicitly consistent with the target emotion during multimodal emotion recognition without modifying the model architecture or requiring additional paired video-description annotations. Our method significantly improves faithful explanation-prediction consistency and explanation emotion accuracy on the MAFW and DFEW datasets. Through extensive experiments and human evaluations, we show that our approach not only enhances alignment between explanation and prediction but also empowers MLLMs to deliver emotionally coherent, trustworthy interactions, marking a key step toward truly human-like HCI systems.

翻译：近年来，多模态大语言模型（MLLMs）的进展正将人机交互（HCI）从表层交流转变为更细致、更具情感智能的沟通。为实现这一转变，情感理解变得至关重要，它使系统能够捕捉用户意图背后的细微线索。此外，为预测的情感提供忠实解释对于确保可解释性和建立用户信任至关重要。然而，当前基于MLLM的方法所生成的情感解释常常偏离目标标签，有时甚至与模型自身预测的情感相矛盾。这种不一致性在交互场景中构成了误解的关键风险，并削弱了系统的可靠性。为解决这一问题，我们提出了一种新颖方法：情感依据验证器（ERV）与解释奖励机制。我们的方法引导模型在多模态情感识别过程中生成与目标情感明确一致的推理，而无需修改模型架构或额外的配对视频-描述标注。在MAFW和DFEW数据集上，我们的方法显著提升了解释-预测一致性的忠实度以及解释的情感准确性。通过大量实验和人工评估，我们证明该方法不仅增强了解释与预测之间的对齐，还使MLLMs能够提供情感一致、可信赖的交互，这标志着向真正类人HCI系统迈出了关键一步。