多模态事实核查：一种基于智能体的方法 (Multimodal Fact-Checking: An Agent-based Approach)

The rapid spread of multimodal misinformation poses a growing challenge for automated fact-checking systems. Existing approaches, including large vision language models (LVLMs) and deep multimodal fusion methods, often fall short due to limited reasoning and shallow evidence utilization. A key bottleneck is the lack of dedicated datasets that provide complete real-world multimodal misinformation instances accompanied by annotated reasoning processes and verifiable evidence. To address this limitation, we introduce RW-Post, a high-quality and explainable dataset for real-world multimodal fact-checking. RW-Post aligns real-world multimodal claims with their original social media posts, preserving the rich contextual information in which the claims are made. In addition, the dataset includes detailed reasoning and explicitly linked evidence, which are derived from human written fact-checking articles via a large language model assisted extraction pipeline, enabling comprehensive verification and explanation. Building upon RW-Post, we propose AgentFact, an agent-based multimodal fact-checking framework designed to emulate the human verification workflow. AgentFact consists of five specialized agents that collaboratively handle key fact-checking subtasks, including strategy planning, high-quality evidence retrieval, visual analysis, reasoning, and explanation generation. These agents are orchestrated through an iterative workflow that alternates between evidence searching and task-aware evidence filtering and reasoning, facilitating strategic decision-making and systematic evidence analysis. Extensive experimental results demonstrate that the synergy between RW-Post and AgentFact substantially improves both the accuracy and interpretability of multimodal fact-checking.

翻译：多模态虚假信息的快速传播对自动化事实核查系统构成了日益严峻的挑战。现有方法，包括大型视觉语言模型（LVLMs）和深度多模态融合方法，常因推理能力有限和证据利用浅显而表现不足。一个关键瓶颈在于缺乏提供完整真实世界多模态虚假信息实例、并附带标注推理过程与可验证证据的专用数据集。为应对这一局限，我们引入了RW-Post，一个面向真实世界多模态事实核查的高质量、可解释数据集。RW-Post将真实世界的多模态主张与其原始社交媒体帖子对齐，保留了主张产生时所处的丰富上下文信息。此外，该数据集包含详细的推理过程和明确关联的证据，这些内容通过大语言模型辅助的提取流程，从人工撰写的事实核查文章中衍生而来，从而支持全面的验证与解释。基于RW-Post，我们提出了AgentFact，一个旨在模拟人类核查工作流程的、基于智能体的多模态事实核查框架。AgentFact由五个专门化智能体组成，它们协同处理关键的事实核查子任务，包括策略规划、高质量证据检索、视觉分析、推理和解释生成。这些智能体通过一个迭代工作流程进行编排，该流程在证据搜索与任务感知的证据过滤及推理之间交替进行，从而促进战略性决策和系统化的证据分析。大量实验结果表明，RW-Post与AgentFact之间的协同作用显著提升了多模态事实核查的准确性和可解释性。