Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.
翻译:现代深度伪造技术通过留下单一分支网络难以捕捉的微妙且领域特定的伪影来逃避检测。ForensicFlow通过融合三个取证维度的证据来解决这一问题:全局视觉不一致性(通过ConvNeXt-tiny)、细粒度纹理异常(通过Swin Transformer-tiny)以及频谱噪声模式(通过带有通道注意力的CNN)。我们基于注意力的时序池化动态地优先处理高证据帧,同时根据伪造类型自适应地融合并加权每个分支。该模型在CelebDF(v2)数据集上使用Focal Loss进行训练,取得了AUC 0.9752、F1分数 0.9408和准确率 0.9208的成绩,优于单流检测器。消融研究证实了分支间的协同作用,Grad-CAM可视化验证了模型对真实篡改区域(如面部边界)的关注。这种多域融合策略为应对日益复杂的伪造技术建立了鲁棒性。