In recent years, there has been significant advancement in the field of model watermarking techniques. However, the protection of image-processing neural networks remains a challenge, with only a limited number of methods being developed. The objective of these techniques is to embed a watermark in the output images of the target generative network, so that the watermark signal can be detected in the output of a surrogate model obtained through model extraction attacks. This promising technique, however, has certain limits. Analysis of the frequency domain reveals that the watermark signal is mainly concealed in the high-frequency components of the output. Thus, we propose an overwriting attack that involves forging another watermark in the output of the generative network. The experimental results demonstrate the efficacy of this attack in sabotaging existing watermarking schemes for image-processing networks, with an almost 100% success rate. To counter this attack, we devise an adversarial framework for the watermarking network. The framework incorporates a specially designed adversarial training step, where the watermarking network is trained to defend against the overwriting network, thereby enhancing its robustness. Additionally, we observe an overfitting phenomenon in the existing watermarking method, which can render it ineffective. To address this issue, we modify the training process to eliminate the overfitting problem.
翻译:近年来,在示范水标记技术领域取得了显著进步,然而,保护图像处理神经网络仍然是一个挑战,只开发了有限的方法。这些技术的目的是在目标基因化网络的输出图像中嵌入水标记,以便在通过模型抽取攻击获得的替代模型输出中检测到水标记信号。然而,这一有希望的技术有一定的局限性。对频率域的分析表明,水标记信号主要隐藏在产出的高频组成部分中。因此,我们建议进行超写攻击,在基因化网络的产出中另设一个水标记。实验结果显示,这次攻击在破坏图像处理网络的现有水标记计划时产生了效果,成功率几乎达到100%。为了对付这一攻击,我们为水标记网络设计了一个对抗性框架。这个框架包含一个专门设计的对抗性培训步骤,对水标记网络进行防御,从而加强其坚固性。此外,我们观察了这一攻击行动的效果,在为图像处理网络设计现有水标记方法的过程中,我们消除了这种无效现象。