We propose PhaseForensics, a DeepFake (DF) video detection method that leverages a phase-based motion representation of facial temporal dynamics. Existing methods relying on temporal inconsistencies for DF detection present many advantages over the typical frame-based methods. However, they still show limited cross-dataset generalization and robustness to common distortions. These shortcomings are partially due to error-prone motion estimation and landmark tracking, or the susceptibility of the pixel intensity-based features to spatial distortions and the cross-dataset domain shifts. Our key insight to overcome these issues is to leverage the temporal phase variations in the band-pass components of the Complex Steerable Pyramid on face sub-regions. This not only enables a robust estimate of the temporal dynamics in these regions, but is also less prone to cross-dataset variations. Furthermore, the band-pass filters used to compute the local per-frame phase form an effective defense against the perturbations commonly seen in gradient-based adversarial attacks. Overall, with PhaseForensics, we show improved distortion and adversarial robustness, and state-of-the-art cross-dataset generalization, with 91.2% video-level AUC on the challenging CelebDFv2 (a recent state-of-the-art compares at 86.9%).
翻译:我们提出SqualForensic(Squal Ferremic),即“深 Fake(DF)”视频探测方法,该方法利用了面部时间动态代表面部时间动态。现有方法依靠时间不一致来探测DF,这比典型的基于框架的方法具有许多优势。然而,这些方法仍然显示,交叉数据集的概括性和稳健性有限,对常见扭曲现象也比较有限。这些缺陷部分是由于易出错的运动估计和标志性跟踪,或者像素强度特征容易受到空间扭曲和跨数据元域变化的影响。我们克服这些问题的关键见解是利用复杂可移动金字塔的频谱带组成部分的时间阶段变异性。这不仅能够对这些地区的时间动态进行可靠的估计,而且对交叉数据变异性也比较较少。此外,用于计算本地每个框架阶段的带宽度过滤器对基于梯度的对抗性对域攻击中常见的扰动进行有效防御。总体而言,随着SqualFerficcs,我们展示了复杂的扭曲和对抗性坚固性硬度,而且州-州级图像化为861.2的CLVAL-1.2通用。