Research on the detection of AI-generated videos has focused almost exclusively on face videos, usually referred to as deepfakes. Manipulations like face swapping, face reenactment and expression manipulation have been the subject of an intense research with the development of a number of efficient tools to distinguish artificial videos from genuine ones. Much less attention has been paid to the detection of artificial non-facial videos. Yet, new tools for the generation of such kind of videos are being developed at a fast pace and will soon reach the quality level of deepfake videos. The goal of this paper is to investigate the detectability of a new kind of AI-generated videos framing driving street sequences (here referred to as DeepStreets videos), which, by their nature, can not be analysed with the same tools used for facial deepfakes. Specifically, we present a simple frame-based detector, achieving very good performance on state-of-the-art DeepStreets videos generated by the Vid2vid architecture. Noticeably, the detector retains very good performance on compressed videos, even when the compression level used during training does not match that used for the test videos.
翻译:人工智能生成的视频的检测研究几乎完全集中在脸部视频上,通常被称为深假视频。像面部交换、面部再演化和表达操纵这样的操作过程一直是一项密集研究的主题,开发了一些高效工具,将人工视频与真实视频区分开来,对人工非面部视频的检测重视程度要少得多。然而,正在快速开发出制作此类视频的新工具,很快将达到深假视频的质量水平。本文的目的是调查一种新型的人工生成视频的可探测性,这些视频设计出驱动街道序列(这里称为深树丛视频),这些视频的性质无法与用于面部深层视频的同样工具分析。具体地说,我们展示了一个简单的基于框架的检测器,在Vid2vid结构产生的最先进的深树类视频上取得了非常好的性能。显而易见的是,检测器在压缩视频上保留了非常好的性能,即使培训中使用的压缩水平与测试视频所使用的压缩水平不匹配。