A variety of effective face-swap and face-reenactment methods have been publicized in recent years, democratizing the face synthesis technology to a great extent. Videos generated as such have come to be collectively called deepfakes with a negative connotation, for various social problems they have caused. Facing the emerging threat of deepfakes, we have built the Korean DeepFake Detection Dataset (KoDF), a large-scale collection of synthesized and real videos focused on Korean subjects. In this paper, we provide a detailed description of methods used to construct the dataset, experimentally show the discrepancy between the distributions of KoDF and existing deepfake detection datasets, and underline the importance of using multiple datasets for real-world generalization. KoDF is publicly available at https://moneybrain-research.github.io/kodf in its entirety (i.e. real clips, synthesized clips, clips with additive noise, and their corresponding metadata).
翻译:近些年来,我们公布了各种有效的面部擦拭和面部再现方法,在很大程度上实现了面部合成技术的民主化,制作的视频因其造成的各种社会问题而被集体称为具有负面内涵的深层假象。面对深层假象的新威胁,我们建立了韩国深层假象探测数据集(韩国深层假象探测数据集),这是一个以朝鲜主题为重点的大规模综合和真实视频集集。在本文中,我们详细描述了用于构建数据集的方法,实验性地展示了KoDF的分布与现有的深假探测数据集之间的差异,并强调了使用多个数据集促进现实世界普遍化的重要性。 KoDFF全文(即真实剪辑、合成剪辑、带有添加噪音的剪辑及其相应的元数据)在https://moneybrain-research.github.io/kodf上公开提供。