Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose a certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods. Our experiments can be reproduced with code at https://github.com/arpitbansal297/Certified_Watermarks
翻译:水标记是一种常用的战略,用来保护创作者对数字图像、视频和音频的权利。最近,水标记方法已经推广到深层学习模式 -- -- 原则上,当对手试图复制模型时,水标记应该保留。然而,在实践中,水标记往往可以由聪明的对手取走。一些论文提出了水标记方法,声称在经验上对不同类型的清除攻击有抗力,但这些新技术往往在面对新的或更好的对手时失败。在本文中,我们提出一种可核证的水标记方法。我们使用清等人提出的随机平滑技术,我们表明我们的水标记保证不可拆除,除非模型参数被超过一定的12阀值。除了可以验证外,我们的水标记与以前的水标记方法相比,在经验上也更加可靠。我们的实验可以在https://github.com/arpitbansal297/Certificed_watermarks复制代码。