Diffusion-based editing enables realistic modification of local image regions, making AI-generated content harder to detect. Existing AIGC detection benchmarks focus on classifying entire images, overlooking the localization of diffusion-based edits. We introduce DiffSeg30k, a publicly available dataset of 30k diffusion-edited images with pixel-level annotations, designed to support fine-grained detection. DiffSeg30k features: 1) In-the-wild images--we collect images or image prompts from COCO to reflect real-world content diversity; 2) Diverse diffusion models--local edits using eight SOTA diffusion models; 3) Multi-turn editing--each image undergoes up to three sequential edits to mimic real-world sequential editing; and 4) Realistic editing scenarios--a vision-language model (VLM)-based pipeline automatically identifies meaningful regions and generates context-aware prompts covering additions, removals, and attribute changes. DiffSeg30k shifts AIGC detection from binary classification to semantic segmentation, enabling simultaneous localization of edits and identification of the editing models. We benchmark three baseline segmentation approaches, revealing significant challenges in semantic segmentation tasks, particularly concerning robustness to image distortions. Experiments also reveal that segmentation models, despite being trained for pixel-level localization, emerge as highly reliable whole-image classifiers of diffusion edits, outperforming established forgery classifiers while showing great potential in cross-generator generalization. We believe DiffSeg30k will advance research in fine-grained localization of AI-generated content by demonstrating the promise and limitations of segmentation-based methods. DiffSeg30k is released at: https://huggingface.co/datasets/Chaos2629/Diffseg30k
翻译:基于扩散的编辑技术能够对图像局部区域进行逼真修改,使得AI生成内容更难被检测。现有的AIGC检测基准主要关注整张图像的分类,忽略了基于扩散编辑的局部化定位。我们提出了DiffSeg30k,这是一个包含3万张扩散编辑图像并带有像素级标注的公开数据集,旨在支持细粒度检测。DiffSeg30k具有以下特点:1) 真实场景图像——我们从COCO数据集中收集图像或图像提示,以反映真实世界内容的多样性;2) 多样化的扩散模型——使用八种最先进的扩散模型进行局部编辑;3) 多轮编辑——每张图像最多经历三轮连续编辑,以模拟真实世界中的序列编辑过程;4) 逼真的编辑场景——基于视觉语言模型(VLM)的流程自动识别有意义的区域,并生成涵盖添加、移除和属性变更的上下文感知提示。DiffSeg30k将AIGC检测从二元分类转向语义分割,能够同时定位编辑区域并识别所使用的编辑模型。我们对三种基线分割方法进行了基准测试,揭示了语义分割任务中的显著挑战,特别是在图像失真鲁棒性方面。实验还表明,尽管分割模型是为像素级定位而训练,它们却展现出作为扩散编辑的整图分类器的高度可靠性,其性能超越了现有的伪造分类器,并在跨生成器泛化方面显示出巨大潜力。我们相信DiffSeg30k将通过展示基于分割方法的潜力与局限性,推动AI生成内容细粒度定位的研究。DiffSeg30k已发布于:https://huggingface.co/datasets/Chaos2629/Diffseg30k