Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations. To trace illegally distributed model copies, DNN watermarking is an emerging technique for embedding and verifying secret identity messages in the prediction behaviors or the model internals. Sacrificing less functionality and involving more knowledge about the target DNN, the latter branch called \textit{white-box DNN watermarking} is believed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts in both the academy and the industry. In this paper, we present the first systematic study on how the mainstream white-box DNN watermarks are commonly vulnerable to neural structural obfuscation with \textit{dummy neurons}, a group of neurons which can be added to a target model but leave the model behavior invariant. Devising a comprehensive framework to automatically generate and inject dummy neurons with high stealthiness, our novel attack intensively modifies the architecture of the target model to inhibit the success of watermark verification. With extensive evaluation, our work for the first time shows that nine published watermarking schemes require amendments to their verification procedures.
翻译:深度神经网络(DNNs)的版权保护对于人工智能公司非常紧迫。为了追踪非法分发的模型副本,DNN水印是一种新兴技术,可以在预测行为或模型内部嵌入和验证秘密身份信息。在牺牲较少功能并涉及更多目标DNN的知识的情况下,后者分支叫做白盒DNN水印,被认为是准确、可信和安全的,可以抵御大多数已知的水印消除攻击,在学术界和工业领域都有不断涌现的研究努力。 本文提出了第一次系统研究主流白盒DNN水印通常如何容易受到混淆神经结构的攻击,用“虚拟神经元”添加到目标模型中的一组神经元,这些神经元不会改变模型的行为。设计一个全面的框架,自动产生和注入高隐蔽性的虚拟神经元,我们的新型攻击强烈修改目标模型的体系结构,以抑制水印验证的成功。通过大量的评估,我们的工作首次表明,九种已发表的水印方案需要修改它们的验证程序。