Deepfakes refer to content synthesized using deep generators, which, when \emph{misused}, have the potential to erode trust in digital media. Synthesizing high-quality deepfakes requires access to large and complex generators only few entities can train and provide. The threat are malicious users that exploit access to the provided model and generate harmful deepfakes without risking detection. Watermarking makes deepfakes detectable by embedding an identifiable code into the generator that is later extractable from its generated images. We propose Pivotal Tuning Watermarking (PTW), a method for watermarking pre-trained generators (i) three orders of magnitude faster than watermarking from scratch and (ii) without the need for any training data. We improve existing watermarking methods and scale to generators $4 \times$ larger than related work. PTW can embed longer codes than existing methods while better preserving the generator's image quality. We propose rigorous, game-based definitions for robustness and undetectability and our study reveals that watermarking is not robust against an adaptive white-box attacker who has control over the generator's parameters. We propose an adaptive attack that can successfully remove any watermarking with access to only $200$ non-watermarked images. Our work challenges the trustworthiness of watermarking for deepfake detection when the parameters of a generator are available.
翻译:深度伪造技术指使用深度生成器合成的内容,当其被错误使用时,可能会破坏对数字媒体的信任。合成高质量的深度伪造需要访问仅少数实体能够培训并提供的大型复杂生成器。威胁是恶意用户利用提供的模型并生成有害的深度伪造,而不会被发现。水印技术可以使深度伪造可检测,方法是将可识别的代码嵌入生成器中,稍后可从其生成的图像中提取。我们提出了基于关键调整的水印技术(PTW),用于预训练生成器的水印,比从头开始的水印技术快三个数量级,而且不需要任何训练数据。我们改进了现有的水印技术,扩展到比相关工作大4倍的生成器。PTW可以嵌入比现有方法更长的代码,同时更好地保留生成器的图像质量。我们提出了严谨的、基于游戏的稳健性和不可检测性定义。我们的研究表明,在生成器的参数可用的情况下,水印技术不具有稳健性。我们提出了一种适应性攻击,可以成功地从只有200个非水印图像的情况下消除任何水印技术。我们的研究挑战了当生成器的参数可用时,因水印技术而产生的深度伪造检测的可信度。