Deep Neural Network (DNN) watermarking is a method for provenance verification of DNN models. Watermarking should be robust against watermark removal attacks that derive a surrogate model that evades provenance verification. Many watermarking schemes that claim robustness have been proposed, but their robustness is only validated in isolation against a relatively small set of attacks. There is no systematic, empirical evaluation of these claims against a common, comprehensive set of removal attacks. This uncertainty about a watermarking scheme's robustness causes difficulty to trust their deployment in practice. In this paper, we evaluate whether recently proposed watermarking schemes that claim robustness are robust against a large set of removal attacks. We survey methods from the literature that (i) are known removal attacks, (ii) derive surrogate models but have not been evaluated as removal attacks, and (iii) novel removal attacks. Weight shifting and smooth retraining are novel removal attacks adapted to the DNN watermarking schemes surveyed in this paper. We propose taxonomies for watermarking schemes and removal attacks. Our empirical evaluation includes an ablation study over sets of parameters for each attack and watermarking scheme on the CIFAR-10 and ImageNet datasets. Surprisingly, none of the surveyed watermarking schemes is robust in practice. We find that schemes fail to withstand adaptive attacks and known methods for deriving surrogate models that have not been evaluated as removal attacks. This points to intrinsic flaws in how robustness is currently evaluated. We show that watermarking schemes need to be evaluated against a more extensive set of removal attacks with a more realistic adversary model. Our source code and a complete dataset of evaluation results are publicly available, which allows to independently verify our conclusions.
翻译:深神经网络(DNN) 水标记是DNN模型的开源核查方法。 水标记应该对水标记清除攻击采取有力措施, 因为这些攻击产生一种替代模型,而这种模式回避了源头核查。 许多水标记计划已经提出, 但它们的稳健性只有在针对相对较小的一系列攻击进行单独评估的情况下才得到验证。 对于这些索赔没有系统、经验性的评价, 也没有针对一套常见的、全面的清除攻击进行这种评估。 水标记计划的稳健性使得难以相信它们在实践中的部署。 在本文中,我们评估最近提出的水标记计划是否声称强健健健健, 以对付大规模清除攻击。 我们从文献中调查的方法是:(一) 已知的清除攻击, (二) 产生代谢性模型, 但没有被评估为清除攻击, (三) 新的清除攻击攻击。 重力和顺利的再培训是新颖的清除攻击, 与本文所调查的DNNNW 水标记模型相比, 我们建议对水标记计划和清除攻击进行分类。 我们的彻底的评估包括了对每次攻击的清除参数的精确的计算, 和水标记方法, 我们的清除方法是用来评估, 我们的测测算的测算。