Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing research practice has committed when evaluating the robustness of watermarking algorithms: (1) incomplete adversarial evaluation and (2) overlooked adaptive attacks. We conduct a comprehensive adversarial evaluation of 11 representative watermarking schemes against six of the existing attacks and demonstrate that each of these watermarking schemes lacks robustness against at least two non-adaptive attacks. We also propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model. We demonstrate that the proposed attacks effectively break all of the 11 watermarking schemes, consequently allowing adversaries to obscure the ownership of any watermarked model. We encourage follow-up studies to consider our guidelines when evaluating the robustness of their watermarking schemes via conducting comprehensive adversarial evaluation that includes our adaptive attacks to demonstrate a meaningful upper bound of watermark robustness.
翻译:在本文中,我们争辩说,最先进的触发点定水算法没有实现其证明所有权的预定目标。我们假设,这种能力受损是由于现有研究实践在评估水标算法的稳健性时造成的两个共同实验缺陷造成的:(1) 对抗性评价不完整,(2) 忽视了适应性攻击。我们对现有六起袭击的11个有代表性的水标法进行全面的对抗性评价,并表明这些水标法对至少两起非适应性攻击缺乏稳健性。我们还提出新的适应性攻击,利用对手对目标模型基本水标算法的了解。我们证明,拟议的攻击实际上打破了11个水标法的所有计划,从而使对手能够掩盖任何水标模型的所有权。我们鼓励后续研究在评估其水标法的稳健性时考虑我们的准则,通过进行全面的对抗性评价,包括我们的适应性攻击,以显示一个有意义的水标的稳性上限。