As companies continue to invest heavily in larger, more accurate and more robust deep learning models, they are exploring approaches to monetize their models while protecting their intellectual property. Model licensing is promising, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current designs have not been able to address piracy attacks, where third parties falsely claim model ownership by embedding their own "pirate watermarks" into an already-watermarked model. We observe that resistance to piracy attacks is fundamentally at odds with the current use of incremental training to embed watermarks into models. In this work, we propose null embedding, a new way to build piracy-resistant watermarks into DNNs that can only take place at a model's initial training. A null embedding takes a bit string (watermark value) as input, and builds strong dependencies between the model's normal classification accuracy and the watermark. As a result, attackers cannot remove an embedded watermark via tuning or incremental training, and cannot add new pirate watermarks to already watermarked models. We empirically show that our proposed watermarks achieve piracy resistance and other watermark properties, over a wide range of tasks and models. Finally, we explore a number of adaptive counter-measures, and show our watermark remains robust against a variety of model modifications, including model fine-tuning, compression, and existing methods to detect/remove backdoors. Our watermarked models are also amenable to transfer learning without losing their watermark properties.
翻译:由于公司继续大量投资于规模更大、更准确和更稳健的深层学习模式,它们正在探索在保护其知识产权的同时将其模型货币化的方法。示范许可证发放模式是很有希望的,但需要一种强有力的工具,让所有者能够声称模型的所有权,即水印。不幸的是,目前的设计未能解决海盗袭击,第三方通过将自己的“水印标记”嵌入一个已经划水的模型,错误地声称模型所有权。我们发现,抵制海盗袭击从根本上与目前使用增量培训将水标记嵌入模型的做法相矛盾。在这项工作中,我们建议取消嵌入,这是将防盗水标记建入DNNM的新方法,只能在模型的初始培训中进行。一个空嵌入式设计无法解决海盗袭击问题,因为第三方在模型的正常分类准确性和水标记之间建立了很强的相互依赖性。因此,攻击者无法通过调整或递增培训来去除嵌入水标记,也无法在已经标记的模型中添加新的海盗水印水标记。我们的经验性地表明,在水上标记中也表明,我们的拟议水调水的模型和沉积中,我们最终展示了我们现有的水度研究方法。