Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.
翻译:隐形伪造类型的普遍性对于面部伪造探测器至关重要。最近的工作在通过合成伪造数据增强合成伪造数据的普及性方面取得了显著进展。在这项工作中,我们探索了改进一般化的另一条途径。我们的目标是减少在培训阶段容易学习的特征,以减少某些伪造类型过分装配的风险。具体地说,用我们的方法,一个教师网络将面部图像作为输入输入,并通过一个多头多目关注的多头目维特产生深层特征的注意图。关注图用来指导一个学生网络通过减少高度集中的深层特征来关注低受关注的特征。还提议了一项深度特征混合战略,以合成特征域内的伪造。实验表明,没有数据增强,我们的方法能够实现在看不见伪造和高度压缩数据方面的有希望的性能。