Deepfakes are synthetically generated images, videos or audios, which fraudsters use to manipulate legitimate information. Current deepfake detection systems struggle against unseen data. To address this, we employ three different deep Convolutional Neural Network (CNN) models, (1) VGG16, (2) InceptionV3, and (3) XceptionNet to classify fake and real images extracted from videos. We also constructed a fusion of the deep CNN models to improve the robustness and generalisation capability. The proposed technique outperforms state-of-the-art models with 96.5% accuracy, when tested on publicly available DeepFake Detection Challenge (DFDC) test data, comprising of 400 videos. The fusion model achieves 99% accuracy on lower quality DeepFake-TIMIT dataset videos and 91.88% on higher quality DeepFake-TIMIT videos. In addition to this, we prove that prediction fusion is more robust against adversarial attacks. If one model is compromised by an adversarial attack, the prediction fusion does not let it affect the overall classification.
翻译:深假是合成生成的图像、视频或声音,欺诈者利用这些图像、视频或音频来操纵合法信息。当前深假检测系统在与隐蔽数据作斗争。为了解决这个问题,我们使用三种不同的深演神经网络模型:(1) VGG16,(2) InvitionV3和(3) XceptiononNet对从视频中提取的假图像和真实图像进行分类。我们还建造了深有CNN模型的集成,以提高稳健性和概括性能力。拟议的技术优于最先进的模型,精确度达到96.5%。当在公开提供的深藏发现挑战(DFDC)测试数据(包括400个视频)进行测试时,这种组合模型在低质量的深藏-TIMIT数据集视频上达到99%的精确度,而在高质量的深藏-TIMIT视频上达到91.88%的精确度。除此之外,我们还证明预测聚合对对抗性攻击更为有力。如果一个模型受到对抗性攻击的损害,预测不会影响总体分类。