Advances in image tampering pose serious security threats, underscoring the need for effective image manipulation localization (IML). While supervised IML achieves strong performance, it depends on costly pixel-level annotations. Existing weakly supervised or training-free alternatives often underperform and lack interpretability. We propose the In-Context Forensic Chain (ICFC), a training-free framework that leverages multi-modal large language models (MLLMs) for interpretable IML tasks. ICFC integrates an objectified rule construction with adaptive filtering to build a reliable knowledge base and a multi-step progressive reasoning pipeline that mirrors expert forensic workflows from coarse proposals to fine-grained forensics results. This design enables systematic exploitation of MLLM reasoning for image-level classification, pixel-level localization, and text-level interpretability. Across multiple benchmarks, ICFC not only surpasses state-of-the-art training-free methods but also achieves competitive or superior performance compared to weakly and fully supervised approaches.
翻译:图像篡改技术的进步带来了严重的安全威胁,凸显了有效的图像篡改定位(IML)的重要性。尽管有监督的IML方法取得了优异的性能,但其依赖于昂贵的像素级标注。现有的弱监督或无训练替代方案往往性能不佳且缺乏可解释性。我们提出了上下文取证链(ICFC),这是一个无需训练的框架,利用多模态大语言模型(MLLMs)执行可解释的IML任务。ICFC集成了对象化规则构建与自适应过滤,以构建可靠的知识库,以及一个多步渐进推理流程,该流程模拟了从粗略提议到细粒度取证结果的专家取证工作流。此设计使得系统性地利用MLLM推理进行图像级分类、像素级定位和文本级解释成为可能。在多个基准测试中,ICFC不仅超越了最先进的无训练方法,而且与弱监督及全监督方法相比,取得了具有竞争力甚至更优的性能。