深刻的假象:面对承认的新威胁? 评估和探测 (DeepFakes: a New Threat to Face Recognition? Assessment and Detection)

It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found that audio-visual approach based on lip-sync inconsistency detection was not able to distinguish Deepfake videos. The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8.97% equal error rate on high quality Deepfakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.

翻译：使用经过预先训练的基因对抗网络(GAN)来自动取代一个人在视频中的脸面,而将一个人的脸与另一个人的脸自动替换起来,变得越来越容易。最近的公共丑闻,例如名人的脸面被转换成色情视频,要求自动检测这些Deepfake视频。为了帮助开发这些方法,我们在本文中分别展示了VidTIMIT数据库视频中产生的第一套公开提供的深假视频。我们使用基于GAN的开放源软件创建了深假视频,我们强调培训和混合参数可以极大地影响最终视频的质量。为了展示这种影响,我们制作了低和高视觉质量的视频(每部320个视频),使用不同调的参数组。我们展示了基于VGGG和Facenet神经网络的艺术脸面部识别系统的状况,分别展示了85.62%和95.00 %的虚假接收率,这意味着进一步检测面面影视频的方法是必要的。我们发现基于唇合成视频质量的最好的视频方法, 以深度检测和深度检测质量的深度检测方法为基础, 我们的深度检测方法通常能够辨别Glegan 。