With the rapid development of deep generative models (such as Generative Adversarial Networks and Auto-encoders), AI-synthesized images of the human face are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processings, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for face forgery detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a face forgery dataset simulating real-world applications for evaluation, we further create a challenging face forgery dataset, named DeepFakeFaceForensics (DF^3), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF^3 dataset and three other open-source datasets.
翻译:随着深层基因化模型(如General Aversarial Networks和Auto-colders)的迅速发展,人工合成的人类面部图像现在质量如此之高,人类很难将其与原始图像区分开来。虽然现有的检测方法在具体评估环境中表现出很高的性能,例如,在视觉模型的图像上或图像上,没有现实世界的后处理器,它们往往在现实世界的情景中遭受严重的性能退化,在现实世界中,通过更强大的生成模型或与各种后处理操作相结合,可以测试图像。为了解决这一问题,我们提议建立全球和地方特色组合(GLFF),通过将整个图像中的多尺度全球特征与精细化的地方特征与用于表面伪造检测的假肢补补丁结合起来,学习丰富和歧视性的表现形式。 GLFF将两个分支的信息结合起来:全球分支提取多尺度的语义学特征,以及当地分支为详细的地方工艺品提取选择信息补丁补丁。由于缺乏面模拟真实世界应用来进行评估,我们进一步创建了具有挑战性面面制的原始数据化的模型,并展示了其他数据生成方法。