As ultra-realistic face forgery techniques emerge, deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations despite excellent performance on known forgeries. In this paper, we are motivated by the observation that the discrepancies between real and fake videos are extremely subtle and localized, and inconsistencies or irregularities can exist in some critical facial regions across various information domains. To this end, we propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general deepfake video detection. In the proposed pipeline, a specialized framework is presented to simultaneously exploit local forgery patterns from space, frequency, and time domains, thus learning cross-domain features to detect forgeries. Moreover, the framework leverages four high-level forgery-sensitive local regions of a human face to guide the model to enhance subtle artifacts and localize potential anomalies. Extensive experiments on several benchmark datasets demonstrate the impressive performance of our method, and we achieve superiority over several state-of-the-art methods on cross-dataset generalization. We also examined the factors that contribute to its performance through ablations, which suggests that exploiting cross-domain local characteristics is a noteworthy direction for developing more general deepfake detectors.
翻译:随着超现实的假冒技术的出现,深假发现由于安全考虑而引起越来越多的关注。许多探测器在发现已知伪造品的出色性能时无法取得准确的结果。在本文中,我们的动机是观察到真实和假视频之间的差异极为微妙和局部,在不同信息领域的某些关键面部区域可能存在不一致或不合规定之处。为此,我们提议建立一个新型管道,即跨Domain地方法证(XDLF),用于更普遍的深假视频探测。在拟议的管道中,提出一个专门框架,以同时利用空间、频率和时空域的本地伪造模式,从而学习跨域特征来探测伪造品。此外,该框架利用四个高度对伪造问题敏感的地方人类面貌区域来指导模型,以加强微妙的艺术品,将潜在异常现象本地化。关于几个基准数据集的广泛实验显示了我们方法的惊人性表现,我们在交叉数据集一般化的若干最先进的方法上取得了优势。我们还研究了有助于其通过深度平铺路进行业绩的各种因素。我们还研究了通过深铺路面探测器来开发一个值得注意的方向。