Deepfake techniques have been used maliciously, resulting in strong research interests in developing Deepfake detection methods. Deepfake often manipulates the video content by tampering with some facial parts. However, this manipulation usually breaks the consistency among facial parts, e.g., Deepfake may change smiling lips to upset, but the eyes are still smiling. Existing works propose to spot inconsistency on some specific facial parts (e.g., lips), but they may perform poorly if new Deepfake techniques focus on the specific facial parts used by the detector. Thus, this paper proposes a new Deepfake detection model, DeepfakeMAE, which can utilize the consistencies among all facial parts. Specifically, given a real face image, we first pretrain a masked autoencoder to learn facial part consistency by randomly masking some facial parts and reconstructing missing areas based on the remaining facial parts. Furthermore, to maximize the discrepancy between real and fake videos, we propose a novel model with dual networks that utilize the pretrained encoder and decoder, respectively. 1) The pretrained encoder is finetuned for capturing the overall information of the given video. 2) The pretrained decoder is utilized for distinguishing real and fake videos based on the motivation that DeepfakeMAE's reconstruction should be more similar to a real face image than a fake one. Our extensive experiments on standard benchmarks demonstrate that DeepfakeMAE is highly effective and especially outperforms the previous state-of-the-art method by 3.1% AUC on average in cross-dataset detection.
翻译:深假技术被恶意使用,导致开发深假检测方法的研究兴趣极大。 深假技术常常通过篡改某些面部部分来操纵视频内容。 但是, 这种操纵通常会打破面部部分的一致性, 例如, 深假可能改变微笑的嘴唇, 但眼睛仍然在微笑。 现有的工程提议在某些具体的面部部分( 如嘴唇) 上发现不一致之处, 但是如果新的深假技术侧重于探测器使用的特定面部部分, 可能会表现不妙。 因此, 本文建议一个新的深假检测模型, 深假MAE, 它可以利用所有面部部分的组合。 具体地说, 以真实面部图像为背景, 我们首先准备一个蒙面部的蒙面部一致性, 根据其余的面部部分重建缺失区域。 此外, 为了尽可能扩大真实和假视频之间的差异, 我们提出一个双向网络的新模型, 使用预先训练的解码器和解码器。 1 事先经过训练的解码模型是精确的, 采集整个面部图像的深度的深度模型, 特别要用真实的模型进行模拟的模拟的模拟的模拟的模拟, 。</s>