Advances in generative modeling have made it increasingly easy to fabricate realistic portrayals of individuals, creating serious risks for security, communication, and public trust. Detecting such person-driven manipulations requires systems that not only distinguish altered content from authentic media but also provide clear and reliable reasoning. In this paper, we introduce TriDF, a comprehensive benchmark for interpretable DeepFake detection. TriDF contains high-quality forgeries from advanced synthesis models, covering 16 DeepFake types across image, video, and audio modalities. The benchmark evaluates three key aspects: Perception, which measures the ability of a model to identify fine-grained manipulation artifacts using human-annotated evidence; Detection, which assesses classification performance across diverse forgery families and generators; and Hallucination, which quantifies the reliability of model-generated explanations. Experiments on state-of-the-art multimodal large language models show that accurate perception is essential for reliable detection, but hallucination can severely disrupt decision-making, revealing the interdependence of these three aspects. TriDF provides a unified framework for understanding the interaction between detection accuracy, evidence identification, and explanation reliability, offering a foundation for building trustworthy systems that address real-world synthetic media threats.
翻译:生成建模技术的进步使得伪造逼真的个人肖像变得日益容易,这对安全、通信和公众信任构成了严重威胁。检测此类针对人物的篡改内容需要系统不仅能够区分篡改内容与真实媒体,还需提供清晰可靠的理由说明。本文提出了TriDF,一个用于可解释深度伪造检测的综合基准。TriDF包含来自先进合成模型的高质量伪造样本,涵盖图像、视频和音频三种模态下的16种深度伪造类型。该基准评估三个关键方面:感知——衡量模型利用人工标注证据识别细粒度篡改伪影的能力;检测——评估模型在不同伪造家族和生成器上的分类性能;幻觉——量化模型生成解释的可靠性。在先进的多模态大语言模型上的实验表明,准确的感知是可靠检测的基础,但幻觉会严重干扰决策过程,揭示了这三个方面之间的相互依存关系。TriDF为理解检测准确性、证据识别和解释可靠性之间的相互作用提供了一个统一框架,为构建应对现实世界合成媒体威胁的可信系统奠定了基础。