In recent years, with the rapid development of face editing and generation, more and more fake videos are circulating on social media, which has caused extreme public concerns. Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images. But for synthesized videos, these methods only confine to single frame and pay little attention to the most discriminative part and temporal frequency clue among different frames. To take full advantage of the rich information in video sequences, this paper performs video forgery detection on both spatial and temporal frequency domains and proposes a Discrete Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation. FCAN-DCT consists of a backbone network and two branches: Compact Feature Extraction (CFE) module and Frequency Temporal Attention (FTA) module. We conduct thorough experimental assessments on two visible light (VIS) based datasets WildDeepfake and Celeb-DF (v2), and our self-built video forgery dataset DeepfakeNIR, which is the first video forgery dataset on near-infrared modality. The experimental results demonstrate the effectiveness of our method on detecting forgery videos in both VIS and NIR scenarios.
翻译:近年来,随着面部编辑和制作的迅速发展,社交媒体上越来越多地流传越来越多的假视频,这引起了公众的极大关注。基于频率域的现有面部伪造检测方法发现GAN伪造图像在频谱中与真实图像相比具有明显的网格式视觉文物。但是,对于合成视频而言,这些方法仅限于单一框架,很少注意不同框架之间最具歧视性的部分和时间频率线索。为了充分利用视频序列中的丰富信息,本文在空间和时间频度领域进行视频伪造检测,并提议建立一个基于分立的Cosine变换假冒显像网络(FCAN-DCT),以实现更加全面的空间时空特征代表。FCAN-DCT由一个主干网和两个分支组成:Claim Fetarmenton(CFE)模块和频率温度关注模块。为了充分利用视频序列中的两种可见光(VIS),我们对基于WardDeepfake和Ceeb-DF(V2)的可见光进行彻底的实验性评估,我们自建的视频伪造数据集以Degifrey NI-IR为首个图像检测结果。