Deep-learning based face-swap videos, also known as deep fakes, are becoming more and more realistic and deceiving. The malicious usage of these face-swap videos has caused wide concerns. The research community has been focusing on the automatic detection of these fake videos, but the as sessment of their visual realism, as perceived by human eyes, is still an unexplored dimension. Visual realism assessment, or VRA, is essential for assessing the potential impact that may be brought by a specific face-swap video, and it is also important as a quality assessment metric to compare different face-swap methods. In this paper, we make a small step to wards this new VRA direction by building a benchmark for evaluating the effectiveness of different automatic VRA models, which range from using traditional hand-crafted features to different kinds of deep-learning features. The evaluations are based on a recent competition dataset named as DFGC 2022, which contains 1400 diverse face-swap videos that are annotated with Mean Opinion Scores (MOS) on visual realism. Comprehensive experiment results using 11 models and 3 protocols are shown and discussed. We demonstrate the feasibility of devising effective VRA models for assessing face-swap videos and methods. The particular usefulness of existing deepfake detection features for VRA is also noted. The code and benchmark will be made publicly available.
翻译:深层学习基于面片的视频,也称为深层假冒视频,正在变得越来越现实和欺骗。恶意使用这些面片视频引起了广泛的关注。研究界一直侧重于自动检测这些假视频,但人类眼中认为,其视觉现实主义的缩影仍是一个尚未探索的层面。视觉现实主义评估,即VRA,对于评估特定面片视频可能带来的潜在影响至关重要,对于比较不同面片方法的质量评估尺度也很重要。在本文中,我们迈出了小步,通过建立评估不同自动 VRA 模型有效性的基准来控制这种新的 VRA 方向,这些模型从使用传统的手工制作特征到不同的深层学习特征。这些评估是以名为DFGC 2022(DFGC 2022)的最近竞争数据集为基础,该数据集包含1400种不同面片的面片版视频,配有对面片评分的注释。使用11个模型和3个协议的全面实验结果将展示为现有真实的 VRA 测试模式和协议,并将展示为现有的实用性标准。