Despite rapid growth in multimodal large language models (MLLMs), their reasoning traces remain opaque: it is often unclear which modality drives a prediction, how conflicts are resolved, or when one stream dominates. In this paper, we introduce modality sabotage, a diagnostic failure mode in which a high-confidence unimodal error overrides other evidence and misleads the fused result. To analyze such dynamics, we propose a lightweight, model-agnostic evaluation layer that treats each modality as an agent, producing candidate labels and a brief self-assessment used for auditing. A simple fusion mechanism aggregates these outputs, exposing contributors (modalities supporting correct outcomes) and saboteurs (modalities that mislead). Applying our diagnostic layer in a case study on multimodal emotion recognition benchmarks with foundation models revealed systematic reliability profiles, providing insight into whether failures may arise from dataset artifacts or model limitations. More broadly, our framework offers a diagnostic scaffold for multimodal reasoning, supporting principled auditing of fusion dynamics and informing possible interventions.
翻译:尽管多模态大语言模型(MLLMs)发展迅速,其推理过程仍不透明:通常难以判断预测由何种模态驱动、冲突如何解决,或何时某一数据流占据主导。本文提出模态破坏这一诊断性失效模式,即高置信度的单模态错误会覆盖其他证据并误导融合结果。为分析此类动态,我们提出一种轻量级、模型无关的评估层,将每种模态视为智能体,生成候选标签及用于审计的简短自评估报告。通过简单融合机制聚合这些输出,可识别贡献者(支持正确结果的模态)与破坏者(产生误导的模态)。在基于基础模型的多模态情感识别基准案例研究中应用该诊断层,揭示了系统性的可靠性特征,有助于判断失效源于数据集伪影还是模型局限。更广泛而言,本框架为多模态推理提供了诊断支架,支持对融合动态进行原则性审计,并为潜在干预措施提供依据。