The proliferation of generative models, such as Generative Adversarial Networks (GANs), Diffusion Models, and Variational Autoencoders (VAEs), has enabled the synthesis of high-quality multimedia data. However, these advancements have also raised significant concerns regarding adversarial attacks, unethical usage, and societal harm. Recognizing these challenges, researchers have increasingly focused on developing methodologies to detect synthesized data effectively, aiming to mitigate potential risks. Prior reviews have predominantly focused on deepfake detection and often overlook recent advancements in synthetic image forensics, particularly approaches that incorporate multimodal frameworks, reasoning-based detection, and training-free methodologies. To bridge this gap, this survey provides a comprehensive and up-to-date review of state-of-the-art techniques for detecting and classifying synthetic images generated by advanced generative AI models. The review systematically examines core detection paradigms, categorizes them into spatial-domain, frequency-domain, fingerprint-based, patch-based, training-free, and multimodal reasoning-based frameworks, and offers concise descriptions of their underlying principles. We further provide detailed comparative analyses of these methods on publicly available datasets to assess their generalizability, robustness, and interpretability. Finally, the survey highlights open challenges and future directions, emphasizing the potential of hybrid frameworks that combine the efficiency of training-free approaches with the semantic reasoning of multimodal models to advance trustworthy and explainable synthetic image forensics.
翻译:随着生成对抗网络(GANs)、扩散模型以及变分自编码器(VAEs)等生成模型的广泛应用,高质量多媒体数据的合成已成为可能。然而,这些技术进步也引发了关于对抗性攻击、非伦理使用及社会危害的严重关切。为应对这些挑战,研究者们日益关注开发有效检测合成数据的方法,以期降低潜在风险。以往的综述主要集中于深度伪造检测领域,且常忽略合成图像取证技术的最新进展,特别是融合多模态框架、基于推理的检测以及免训练方法的研究方向。为弥补这一空白,本文对当前先进的生成式AI模型所合成图像的检测与分类技术进行了全面且最新的综述。本综述系统梳理了核心检测范式,将其划分为空间域、频域、基于指纹、基于图像块、免训练以及基于多模态推理的框架,并对其基本原理进行了简明阐述。我们进一步在公开数据集上对这些方法进行了详细的对比分析,以评估其泛化性、鲁棒性和可解释性。最后,本文指出了该领域面临的开放挑战与未来发展方向,强调将免训练方法的高效性与多模态模型的语义推理能力相结合的混合框架在推进可信且可解释的合成图像取证技术方面的潜力。