通过基因对抗网络探测和清除深神经网络中的水印 (Detect and remove watermark in deep neural networks via generative adversarial networks)

Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires a lot of computing resources and training data. It is difficult for most individual users to obtain such computing resources and training data. Model copyright infringement is an emerging problem in recent years. For instance, pre-trained models may be stolen or abuse by illegal users without the authorization of the model owner. Recently, many works on protecting the intellectual property of DNN models have been proposed. In these works, embedding watermarks into DNN based on backdoor is one of the widely used methods. However, when the DNN model is stolen, the backdoor-based watermark may face the risk of being detected and removed by an adversary. In this paper, we propose a scheme to detect and remove watermark in deep neural networks via generative adversarial networks (GAN). We demonstrate that the backdoor-based DNN watermarks are vulnerable to the proposed GAN-based watermark removal attack. The proposed attack method includes two phases. In the first phase, we use the GAN and few clean images to detect and reverse the watermark in the DNN model. In the second phase, we fine-tune the watermarked DNN based on the reversed backdoor images. Experimental evaluations on the MNIST and CIFAR10 datasets demonstrate that, the proposed method can effectively remove about 98% of the watermark in DNN models, as the watermark retention rate reduces from 100% to less than 2% after applying the proposed attack. In the meantime, the proposed attack hardly affects the model's performance. The test accuracy of the watermarked DNN on the MNIST and the CIFAR10 datasets drops by less than 1% and 3%, respectively.

翻译：深心神经网络(DNN)在不同领域取得了显著的绩效。然而,从零开始培训DNN模型需要大量计算资源和培训数据。对于大多数个人用户来说,很难获得这种计算资源和培训数据。示范版权侵犯是近年来出现的一个新问题。例如,未经模型拥有者授权,预先培训的模型可能会被非法用户盗窃或滥用。最近,提出了许多保护DNN模型知识产权的工作。在这些工程中,在后门将水印嵌入 DNN模型是广泛使用的方法之一。然而,当DNN模型被盗时,基于后门的精确标记可能面临被敌人探测和删除的风险。在本文件中,我们提出了一个办法,通过基因对抗网络(GAN)探测或清除深内线网络中的水标记。我们证明,基于后门的 DNNNW 模型的 DNNW 水标记容易被提议的水标记攻击。在第一阶段,我们使用GAN 和少量干净的图像来分别检测和反向DNFAR 模型中的水标记。在 DNF 10 模型中,我们用NFD 10 的数值来演示了D 。在98 10 数据库中,在D 模型中,DMNFD 数据中,以内的拟议的水标记数据可以减少。