With the rapid development of facial manipulation techniques, face forgery detection has received considerable attention in digital media forensics due to security concerns. Most existing methods formulate face forgery detection as a classification problem and utilize binary labels or manipulated region masks as supervision. However, without considering the correlation between local regions, these global supervisions are insufficient to learn a generalized feature and prone to overfitting. To address this issue, we propose a novel perspective of face forgery detection via local relation learning. Specifically, we propose a Multi-scale Patch Similarity Module (MPSM), which measures the similarity between features of local regions and forms a robust and generalized similarity pattern. Moreover, we propose an RGB-Frequency Attention Module (RFAM) to fuse information in both RGB and frequency domains for more comprehensive local feature representation, which further improves the reliability of the similarity pattern. Extensive experiments show that the proposed method consistently outperforms the state-of-the-arts on widely-used benchmarks. Furthermore, detailed visualization shows the robustness and interpretability of our method.
翻译:随着面部操纵技术的迅速发展,面部伪造检测在数字媒体法证中得到相当的重视,因为安全考虑,大多数现有方法将面部伪造检测作为一种分类问题,并利用二进制标签或被操纵的区域面具作为监督;然而,在不考虑地方区域之间的相互关系的情况下,这些全球监督不足以了解一个普遍特征,而且容易过度适应;为了解决这一问题,我们提出了一个通过地方关系学习来发现面部伪造的新视角;具体地说,我们提议了一个多级贴近模块,用以衡量地方区域特征之间的相似性,形成一种稳健和普遍相似的模式;此外,我们提议建立一个RGB-量度关注模块(RFAM),用于整合RGB和频率区域的信息,以便更全面地反映地方特征,进一步提高类似模式的可靠性;广泛的实验表明,拟议的方法始终超越了广泛使用的基准的状态。此外,详细的直观显示我们的方法的稳健性和可解释性。