Fake videos represent an important misinformation threat. While existing forensic networks have demonstrated strong performance on image forgeries, recent results reported on the Adobe VideoSham dataset show that these networks fail to identify fake content in videos. In this paper, we propose a new network that is able to detect and localize a wide variety of video forgeries and manipulations. To overcome challenges that existing networks face when analyzing videos, our network utilizes both forensic embeddings to capture traces left by manipulation, context embeddings to exploit forensic traces' conditional dependencies upon local scene content, and spatial attention provided by a deep, transformer-based attention mechanism. We create several new video forgery datasets and use these, along with publicly available data, to experimentally evaluate our network's performance. These results show that our proposed network is able to identify a diverse set of video forgeries, including those not encountered during training. Furthermore, our results reinforce recent findings that image forensic networks largely fail to identify fake content in videos.
翻译:虽然现有的法医网络在图像伪造方面表现良好,但最近在Adobe VideSham数据集上报告的结果显示,这些网络未能识别视频中的虚假内容。在本文中,我们提议建立一个能够检测和本地化各种视频伪造和操纵的新网络。为了克服现有网络在分析视频时面临的挑战,我们的网络利用法医嵌入手段捕捉操纵留下的痕迹、利用法医痕迹有条件依赖当地现场内容的环境嵌入以及深层变压器关注机制提供的空间关注。我们创建了几个新的视频伪造数据集,并连同公开的数据一起用于实验性地评估我们的网络绩效。这些结果表明,我们提议的网络能够确定一套不同的视频伪造,包括培训期间没有遇到的那些。此外,我们的成果强化了最近的调查结果,即图像法医网络在很大程度上无法识别视频中的虚假内容。