Video synthesis methods rapidly improved in recent years, allowing easy creation of synthetic humans. This poses a problem, especially in the era of social media, as synthetic videos of speaking humans can be used to spread misinformation in a convincing manner. Thus, there is a pressing need for accurate and robust deepfake detection methods, that can detect forgery techniques not seen during training. In this work, we explore whether this can be done by leveraging a multi-modal, out-of-domain backbone trained in a self-supervised manner, adapted to the video deepfake domain. We propose FakeOut; a novel approach that relies on multi-modal data throughout both the pre-training phase and the adaption phase. We demonstrate the efficacy and robustness of FakeOut in detecting various types of deepfakes, especially manipulations which were not seen during training. Our method achieves state-of-the-art results in cross-manipulation and cross-dataset generalization. This study shows that, perhaps surprisingly, training on out-of-domain videos (i.e., videos with no speaking humans), can lead to better deepfake detection systems. Code is available on GitHub.
翻译:近些年来,视频合成方法迅速改进,使合成人得以轻松创造。这造成了一个问题,特别是在社交媒体时代,因为人说话的合成视频可以令人信服地用于传播错误信息。因此,迫切需要准确和有力的深假检测方法,能够检测培训期间看不到的伪造技术。在这项工作中,我们探索能否通过利用以自我监督的方式培训的多式外主干骨干来做到这一点,该主干将适应视频深假域。我们提议假出;在培训前阶段和适应阶段都采用新颖的方法,依赖多式数据。我们展示了假出在发现各种深度假冒(特别是培训期间看不到的操纵)方面的功效和强健性。我们的方法在交叉操纵和交叉数据概括方面达到了最新的结果。我们的研究显示,也许令人惊讶的是,外出视频(即没有人发言的视频)培训能够导致更好的深底发现系统。GiHub是可用的。