Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing research practice has committed when evaluating the robustness of watermarking algorithms: (1) incomplete adversarial evaluation and (2) overlooked adaptive attacks. We conduct a comprehensive adversarial evaluation of 10 representative watermarking schemes against six of the existing attacks and demonstrate that each of these watermarking schemes lacks robustness against at least two attacks. We also propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model. We demonstrate that the proposed attacks effectively break all of the 10 watermarking schemes, consequently allowing adversaries to obscure the ownership of any watermarked model. We encourage follow-up studies to consider our guidelines when evaluating the robustness of their watermarking schemes via conducting comprehensive adversarial evaluation that include our adaptive attacks to demonstrate a meaningful upper bound of watermark robustness.
翻译:在本文中,我们主张,最先进的触发定位水的测算算法没有达到其证明所有权的预定目标。我们假设,这种能力受损的原因是现有研究实践在评估水标算法的稳健性时所造成两个共同的实验缺陷:(1) 对抗性评估不完整,(2) 忽视了适应性攻击。我们对现有六起袭击的10个有代表性的水标计进行全面的对抗性评价,并表明,所有这些水标计法对至少两起袭击缺乏稳健性。我们还提出新的适应性攻击,利用对手对一个目标模型的基本水标算法的了解。我们证明,拟议的攻击有效地打破了10个水标法的所有共同的实验性缺陷,从而使对手能够模糊任何水标模型的所有权。我们鼓励后续研究在评估其水标法的稳健性时考虑我们的指南,通过进行全面的对准性评价,其中包括我们的适应性攻击,以显示一个有意义的水标的稳健性上层。