无需训练的上下文取证链：图像篡改检测与定位 (Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization)

Advances in image tampering pose serious security threats, underscoring the need for effective image manipulation localization (IML). While supervised IML achieves strong performance, it depends on costly pixel-level annotations. Existing weakly supervised or training-free alternatives often underperform and lack interpretability. We propose the In-Context Forensic Chain (ICFC), a training-free framework that leverages multi-modal large language models (MLLMs) for interpretable IML tasks. ICFC integrates an objectified rule construction with adaptive filtering to build a reliable knowledge base and a multi-step progressive reasoning pipeline that mirrors expert forensic workflows from coarse proposals to fine-grained forensics results. This design enables systematic exploitation of MLLM reasoning for image-level classification, pixel-level localization, and text-level interpretability. Across multiple benchmarks, ICFC not only surpasses state-of-the-art training-free methods but also achieves competitive or superior performance compared to weakly and fully supervised approaches.

翻译：图像篡改技术的进步带来了严重的安全威胁，凸显了有效的图像篡改定位（IML）的重要性。尽管有监督的IML方法取得了优异的性能，但其依赖于昂贵的像素级标注。现有的弱监督或无训练替代方案往往性能不佳且缺乏可解释性。我们提出了上下文取证链（ICFC），这是一个无需训练的框架，利用多模态大语言模型（MLLMs）执行可解释的IML任务。ICFC集成了对象化规则构建与自适应过滤，以构建可靠的知识库，以及一个多步渐进推理流程，该流程模拟了从粗略提议到细粒度取证结果的专家取证工作流。此设计使得系统性地利用MLLM推理进行图像级分类、像素级定位和文本级解释成为可能。在多个基准测试中，ICFC不仅超越了最先进的无训练方法，而且与弱监督及全监督方法相比，取得了具有竞争力甚至更优的性能。