Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between the real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improving the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance the efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
翻译:最近,Deepfake吸引了公众对社交媒体数字法证中安全和隐私问题的大量关注。随着互联网上疯狂传播的Depfake视频变得更加现实,传统探测技术未能区分真实和假相。大多数现有的深层学习方法主要侧重于图像中的地方特征和关系,使用进化神经网络作为主干线;然而,本地特征和关系不足以进行示范培训,以学习足够的一般信息,供深层假相探测。因此,现有的深层假相探测方法已经到了进一步改进探测性能的瓶颈。为了解决这一问题,我们提议了一个深层革命变异器,以纳入本地和全球的决定性图像特征。具体地说,我们运用演进式集合和重新尝试来丰富提取的特征,提高效果。此外,我们在改进性能的示范培训中使用了很少讨论的图像关键框架,并直观了由视频压缩导致的钥匙和正常图像框架之间的特征数量差距。我们最后用几个Deepfake基准数据集的广泛实验来说明可转移性。拟议的解决办法始终优于数项在内部和交叉数据上进行的州级基线实验。