AI-manipulated videos, commonly known as deepfakes, are an emerging problem. Recently, researchers in academia and industry have contributed several (self-created) benchmark deepfake datasets, and deepfake detection algorithms. However, little effort has gone towards understanding deepfake videos in the wild, leading to a limited understanding of the real-world applicability of research contributions in this space. Even if detection schemes are shown to perform well on existing datasets, it is unclear how well the methods generalize to real-world deepfakes. To bridge this gap in knowledge, we make the following contributions: First, we collect and present the largest dataset of deepfake videos in the wild, containing 1,869 videos from YouTube and Bilibili, and extract over 4.8M frames of content. Second, we present a comprehensive analysis of the growth patterns, popularity, creators, manipulation strategies, and production methods of deepfake content in the real-world. Third, we systematically evaluate existing defenses using our new dataset, and observe that they are not ready for deployment in the real-world. Fourth, we explore the potential for transfer learning schemes and competition-winning techniques to improve defenses.
翻译:人工智能管理视频,通常被称为深假,是一个新出现的问题。最近,学术界和工业界的研究人员贡献了几种(自建)深假数据库和深假检测算法。然而,在野外对深假视频的理解方面,没有做出什么努力,导致对这一空间中研究贡献真实世界适用性的理解有限。即使检测计划显示在现有数据集上表现良好,也不清楚将方法概括到真实世界深假的方法有多好。为了弥合这一知识差距,我们做出了以下贡献:首先,我们收集并展示了野外最大型的深假视频数据集,包含YouTube和Bililibili的1 869个视频,并提取了超过4.8M内容框架的内容。第二,我们对真实世界中增长模式、受欢迎程度、创建者、操纵策略和深假成内容的生产方法进行了全面分析。第三,我们用我们的新数据集系统评估了现有的防御系统,并观察到它们尚未准备好在现实世界中部署。第四,我们探索了转让计划和竞争技术的潜力。