The proliferation of multimodal misinformation poses growing threats to public discourse and societal trust. While Large Vision-Language Models (LVLMs) have enabled recent progress in multimodal misinformation detection (MMD), the rise of generative AI (GenAI) tools introduces a new challenge: GenAI-driven news diversity, characterized by highly varied and complex content. We show that this diversity induces multi-level drift, comprising (1) model-level misperception drift, where stylistic variations disrupt a model's internal reasoning, and (2) evidence-level drift, where expression diversity degrades the quality or relevance of retrieved external evidence. These drifts significantly degrade the robustness of current LVLM-based MMD systems. To systematically study this problem, we introduce DriftBench, a large-scale benchmark comprising 16,000 news instances across six categories of diversification. We design three evaluation tasks: (1) robustness of truth verification under multi-level drift; (2) susceptibility to adversarial evidence contamination generated by GenAI; and (3) analysis of reasoning consistency across diverse inputs. Experiments with six state-of-the-art LVLM-based detectors show substantial performance drops (average F1 -14.8%) and increasingly unstable reasoning traces, with even more severe failures under adversarial evidence injection. Our findings uncover fundamental vulnerabilities in existing MMD systems and suggest an urgent need for more resilient approaches in the GenAI era.
翻译:多模态虚假信息的泛滥对公共话语和社会信任构成日益严重的威胁。尽管大型视觉语言模型(LVLM)推动了多模态虚假信息检测(MMD)领域的最新进展,但生成式人工智能(GenAI)工具的兴起带来了新的挑战:即由GenAI驱动的新闻多样性,其特点是内容高度多变且复杂。我们证明,这种多样性会引发多层次的漂移,包括:(1)模型层面的感知漂移,即风格变化干扰模型的内部推理过程;(2)证据层面的漂移,即表达多样性降低所检索外部证据的质量或相关性。这些漂移显著削弱了当前基于LVLM的MMD系统的鲁棒性。为系统研究此问题,我们提出了DriftBench,这是一个包含六个多样化类别下16,000个新闻实例的大规模基准。我们设计了三个评估任务:(1)多级漂移下真实性验证的鲁棒性;(2)对GenAI生成的对抗性证据污染的敏感性;(3)多样化输入下推理一致性的分析。对六种最先进的基于LVLM的检测器进行的实验显示,其性能显著下降(平均F1值降低14.8%),且推理轨迹日益不稳定,在对抗性证据注入下甚至出现更严重的失效。我们的研究揭示了现有MMD系统的根本性脆弱点,并表明在GenAI时代迫切需要更具韧性的方法。