With the progress in AI-based facial forgery (i.e., deepfake), people are increasingly concerned about its abuse. Albeit effort has been made for training classification (also known as deepfake detection) models to recognize such forgeries, existing models suffer from poor generalization to unseen forgery technologies and high sensitivity to changes in image/video quality. In this paper, we advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. We believe training with samples that are adversarially crafted to attack the classification models improves the generalization ability considerably. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we further propose a new adversarial training method that attempts to blur out these specific artifacts, by introducing pixel-wise Gaussian blurring models. With adversarial training, the classification models are forced to learn more discriminative and generalizable features, and the effectiveness of our method can be verified by plenty of empirical evidence. Our code will be made publicly available.
翻译:随着人工智能面部伪造(即深度假造)的进展,人们越来越关注其滥用问题。尽管已作出努力培训分类(又称深假检测)模型,以识别此类伪造,但现有模型缺乏对隐形伪造技术的概括性,对图像/视频质量的变化敏感度很高。在本文中,我们主张进行对抗性培训,以提高隐形面部伪造和隐形图像/视频品质的概括性能力。我们认为,用对称制作的样本来攻击分类模型的训练极大地提高了一般化能力。考虑到基于人工智能的面部操纵往往导致高频文物,很容易被模型发现,但难以概括。我们进一步提出新的对抗性培训方法,试图通过引入像素的高斯模糊模型来模糊这些特定文物。通过对抗性培训,分类模型被迫学习更具有歧视性和可概括性的特征,我们方法的有效性可以通过大量的经验证据加以验证。我们的代码将被公开公布。