Intellectual property (IP) protection for Deep Neural Networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and lack of interpretability. In this paper, for the first time, we propose an interpretable intellectual property protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using Deep Taylor Decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.
翻译:近年来,对深神经网络(DNN)的知识产权保护引起了严重的关注。大多数现有工程将水印嵌入了DNNIP保护模式,这需要修改模型模型和缺乏可解释性。在本文件中,我们首次根据可解释的人工智能为DNN提出可解释的知识产权保护方法。与现有工程相比,拟议方法并不修改DNN模式,所有权核查的决定是可以解释的。我们通过深海泰勒拆解,提取了DNN模式的内在特征。由于该模式的内在特征是对该模式决定的独特解释,其内在特征可以被视为模型的指纹。如果怀疑模型的指纹与原始模型相同,则将怀疑模型视为盗版模型。实验结果表明,指纹可以成功地用于核实模型的所有权,模型的测试准确性不会受到影响。此外,拟议方法对于调整攻击、钻井攻击、水印过面攻击和适应性攻击具有很强性。