Since photorealistic faces can be readily generated by facial manipulation technologies nowadays, potential malicious abuse of these technologies has drawn great concerns. Numerous deepfake detection methods are thus proposed. However, existing methods only focus on detecting one-step facial manipulation. As the emergence of easy-accessible facial editing applications, people can easily manipulate facial components using multi-step operations in a sequential manner. This new threat requires us to detect a sequence of facial manipulations, which is vital for both detecting deepfake media and recovering original faces afterwards. Motivated by this observation, we emphasize the need and propose a novel research problem called Detecting Sequential DeepFake Manipulation (Seq-DeepFake). Unlike the existing deepfake detection task only demanding a binary label prediction, detecting Seq-DeepFake manipulation requires correctly predicting a sequential vector of facial manipulation operations. To support a large-scale investigation, we construct the first Seq-DeepFake dataset, where face images are manipulated sequentially with corresponding annotations of sequential facial manipulation vectors. Based on this new dataset, we cast detecting Seq-DeepFake manipulation as a specific image-to-sequence (e.g. image captioning) task and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer). Moreover, we build a comprehensive benchmark and set up rigorous evaluation protocols and metrics for this new research problem. Extensive experiments demonstrate the effectiveness of SeqFakeFormer. Several valuable observations are also revealed to facilitate future research in broader deepfake detection problems.
翻译:由于面部操纵技术现在很容易产生光真面孔,因此,潜在的恶意滥用这些技术的行为引起了极大的关注。因此,提出了许多深层假冒检测方法。然而,现有方法仅侧重于检测一步面部操纵。随着容易获取的面部编辑应用程序的出现,人们可以很容易地使用多步操作来操纵面部组成部分。这一新的威胁要求我们检测一系列面部操纵,这对于探测深层假介质和随后恢复原始面部至关重要。受这一观察的驱动,我们强调需要并提议一个新颖的研究问题,称为测测深深层面部操纵(Seq-DeepFake ) 。与现有的深层假冒检测任务不同,只是要求二进制标签预测,检测Seq-DeepFake操纵需要正确预测面部操纵操作的连续矢量。为了支持大规模调查,我们建造了第一个Seq-DeepFake数据集,其中将脸部图像按顺序排列为相应的顺序描述。基于这个新的数据设置,我们检测Seq-Defreal-Develrial观测了S-Defrialal requialalal liction laction (Sreal lagreal requidudududududududududustr lade) a lade laction a laction a laction a laction a laction a ex a ex ex ex ex ex a ex str str str lautututututilding a subil) a subilding (我们提出一个具体的图像,我们提出一个具体的缩算和Strade) a ex a labildaltiction a ex ex ex laction a subildal subildal ladertradediamental ladeal subal lader a ladal ladal lader) a subild)。