Deepfake Generation Techniques are evolving at a rapid pace, making it possible to create realistic manipulated images and videos and endangering the serenity of modern society. The continual emergence of new and varied techniques brings with it a further problem to be faced, namely the ability of deepfake detection models to update themselves promptly in order to be able to identify manipulations carried out using even the most recent methods. This is an extremely complex problem to solve, as training a model requires large amounts of data, which are difficult to obtain if the deepfake generation method is too recent. Moreover, continuously retraining a network would be unfeasible. In this paper, we ask ourselves if, among the various deep learning techniques, there is one that is able to generalise the concept of deepfake to such an extent that it does not remain tied to one or more specific deepfake generation methods used in the training set. We compared a Vision Transformer with an EfficientNetV2 on a cross-forgery context based on the ForgeryNet dataset. From our experiments, It emerges that EfficientNetV2 has a greater tendency to specialize often obtaining better results on training methods while Vision Transformers exhibit a superior generalization ability that makes them more competent even on images generated with new methodologies.
翻译:深假生成技术正在迅速发展,从而有可能产生现实的操纵图像和视频,并危及现代社会的宁静。不断出现新的和各种各样的技术,由此又带来一个有待面对的问题,即深假检测模型能够迅速更新自己,以便能够发现甚至使用最新方法进行的操纵。这是一个极其复杂的问题,因为培训模型需要大量数据,而如果深假生成方法太近,则很难获得这些数据。此外,不断对网络进行再培训将是行不通的。在本文中,我们自问,在各种深度学习技术中,是否有一种能够普及深假检测模型概念,以便迅速更新自己,从而能够发现这种模型是否仍然不能与培训集中所使用的一种或一种以上具体的深假生成方法联系在一起。我们用一个基于ForgeryNet数据集的跨孔背景的高效网络V2来比较一个愿景变压器和一个高效的网络2。通过我们的实验发现,高效的网络2将更加倾向于以更先进的方法专门化,常常以更先进的方法获得更先进的培训方法。