In this paper we propose a novel human-centered approach for detecting forgery in face images, using dynamic prototypes as a form of visual explanations. Currently, most state-of-the-art deepfake detections are based on black-box models that process videos frame-by-frame for inference, and few closely examine their temporal inconsistencies. However, the existence of such temporal artifacts within deepfake videos is key in detecting and explaining deepfakes to a supervising human. To this end, we propose Dynamic Prototype Network (DPNet) -- an interpretable and effective solution that utilizes dynamic representations (i.e., prototypes) to explain deepfake temporal artifacts. Extensive experimental results show that DPNet achieves competitive predictive performance, even on unseen testing datasets such as Google's DeepFakeDetection, DeeperForensics, and Celeb-DF, while providing easy referential explanations of deepfake dynamics. On top of DPNet's prototypical framework, we further formulate temporal logic specifications based on these dynamics to check our model's compliance to desired temporal behaviors, hence providing trustworthiness for such critical detection systems.
翻译:在本文中,我们提出一种新的以人为中心的方法,用动态原型来探测表面图像中的伪造,将动态原型作为视觉解释的一种形式。目前,大多数最先进的深假探测都基于黑盒模型,这些模型处理视频的逐条推断,很少仔细检查时间上的不一致之处。然而,深假视频中存在这种时间工艺品对于探测和解释深假对人类监管者来说至关重要。为此,我们提议动态原型网络(DPNet) -- -- 一种可解释和有效的解决方案,利用动态模型(即原型)来解释深假时间文物。广泛的实验结果显示,DPNet取得了竞争性预测性能,即使是在谷歌的深福克探测器、Deeper Forensics和Ceeb-DF等秘密测试数据集上也是如此,同时为深福克动力动态提供了简单的特解释。在DPNet的原型框架之外,我们进一步根据这些动态设计了时间逻辑规格,以检查我们的模型是否符合期望的时空行为,从而为这类关键探测系统提供了信任度。