Current face forgery detection methods achieve high accuracy under the within-database scenario where training and testing forgeries are synthesized by the same algorithm. However, few of them gain satisfying performance under the cross-database scenario where training and testing forgeries are synthesized by different algorithms. In this paper, we find that current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize. Observing that image noises remove color textures and expose discrepancies between authentic and tampered regions, we propose to utilize the high-frequency noises for face forgery detection. We carefully devise three functional modules to take full advantage of the high-frequency features. The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales and composes a novel modality. The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective. The last is the cross-modality attention module that leverages the correlation between the two complementary modalities to promote feature learning for each other. Comprehensive evaluations on several benchmark databases corroborate the superior generalization performance of our proposed method.
翻译:在数据库内部假设中,培训和测试伪造材料用同一算法合成,当前面部伪造检测方法具有很高的准确性;然而,在跨数据库假设中,培训和测试伪造材料用不同的算法合成,很少有人能取得令人满意的性能;在本文中,我们发现,目前以CNN为基础的检测器往往过分适合特定方法的色质,因此无法笼统化。我们注意到,图像噪音会去除颜色纹理,暴露真实和被篡改区域之间的差异。我们提议利用高频噪音来检测面部伪造材料。我们仔细设计了三个功能模块,以充分利用高频特征。第一个是多频高频特征提取模块,在多个尺度上提取高频噪音,并形成一种新颖模式。第二个是引导低级 RGB 特征提取器的残余引导空间关注模块,以便从新角度更多地关注伪造痕迹。最后一个是交叉式关注模块,它利用两种互补模式之间的关联性来促进彼此的特征学习。对几个基准数据库的全面评价证实了我们拟议方法的总体性。