Residual-domain feature is very useful for Deepfake detection because it suppresses irrelevant content features and preserves key manipulation traces. However, inappropriate residual prediction will bring side effects on detection accuracy. In addition, residual-domain features are easily affected by image operations such as compression. Most existing works exploit either spatial-domain features or residual-domain features, while neglecting that two types of features are mutually correlated. In this paper, we propose a guided residuals network, namely GRnet, which fuses spatial-domain and residual-domain features in a mutually reinforcing way, to expose face images generated by Deepfake. Different from existing prediction based residual extraction methods, we propose a manipulation trace extractor (MTE) to directly remove the content features and preserve manipulation traces. MTE is a fine-grained method that can avoid the potential bias caused by inappropriate prediction. Moreover, an attention fusion mechanism (AFM) is designed to selectively emphasize feature channel maps and adaptively allocate the weights for two streams. The experimental results show that the proposed GRnet achieves better performances than the state-of-the-art works on four public fake face datasets including HFF, FaceForensics++, DFDC and Celeb-DF. Especially, GRnet achieves an average accuracy of 97.72% on the HFF dataset, which is at least 5.25% higher than the existing works.
翻译:Deepfake 发现残余面部特征非常有用,因为它抑制了不相关的内容特征并保存了关键操作痕迹。然而,不适当的残余预测会对探测准确性产生副作用。此外,残余面部特征很容易受到压缩等图像操作的影响。大多数现有作品利用的是空间-表面特征或残余面部特征,而忽视了两种类型特征是相互联系的。在本文件中,我们提议了一个有指导的残余网络,即Grannet,它以相互加强的方式结合空间-表面和剩余面部特征,以暴露Deepfake生成的面部图像。与现有的基于预测的残余提取方法不同,我们提议了一个操纵痕量提取器(MTE),以直接删除内容特征并保存操纵痕迹。MTE是一种细微的精度方法,可以避免不适当的预测造成的潜在偏差。此外,一个关注聚合机制(AFMy)旨在有选择地强调特征频道地图和最不适应性地分配两个流的重量。实验结果显示,拟议的Grannet的性能优于基于现有9725号的状态的图像提取器精度,包括四张的GLAS-DF平均数据。