深忠心的深学习深度分类模型 (Backdoor Watermarking Deep Learning Classification Models With Deep Fidelity)

Backdoor Watermarking is a promising paradigm to protect the copyright of deep neural network (DNN) models for classification tasks. In the existing works on this subject, researchers have intensively focused on watermarking robustness, while fidelity, which is concerned with the original functionality, has received less attention. In this paper, we show that the existing shared notion of the sole measurement of learning accuracy is insufficient to characterize backdoor fidelity. Meanwhile, we show that the analogous concept of embedding distortion in multimedia watermarking, interpreted as the total weight loss (TWL) in DNN backdoor watermarking, is also unsuitable to measure the fidelity. To solve this problem, we propose the concept of deep fidelity, which states that the backdoor watermarked DNN model should preserve both the feature representation and decision boundary of the unwatermarked host model. Accordingly, to realize deep fidelity, we propose two loss functions termed as penultimate feature loss (PFL) and softmax probability-distribution loss (SPL) to preserve feature representation, while the decision boundary is preserved by the proposed fix last layer (FixLL) treatment, inspired by the recent discovery that deep learning with a fixed classifier causes no loss of learning accuracy. With the above designs, both embedding from scratch and fine-tuning strategies are implemented to evaluate deep fidelity of backdoor embedding, whose advantages over the existing methods are verified via experiments using ResNet18 for MNIST and CIFAR-10 classifications, and wide residual network (i.e., WRN28_10) for CIFAR-100 task.

翻译：后门水标记是保护深神经网络(DNN)分类任务模型版权的一个很有希望的范例。在目前关于这个主题的工程中,研究人员集中关注水标记稳健性,而与原始功能有关的忠诚性则受到较少关注。在本文中,我们表明,目前唯一测量学习准确性的共同概念不足以说明后门忠诚性。同时,我们表明,将扭曲嵌入多媒体水标记的类似概念,被解释为DNN(DN)后门水标记中的总重量损失(TWL),也不适合衡量忠诚性。为了解决这个问题,我们提出了深忠诚性概念,指出后门水标记DNNNNN的模型应该既保持未加水标记的主机模型的特征表示和决定界限。因此,为了实现深度忠诚性,我们提议将两个损失功能称为前端特征丢失(PFL)和软式概率分配损失(SPL),以维护地貌表示,而决定边界则由拟议的固定的精度(FLLLL)的精度固定的精度网络处理,而最近采用固定的精度的精度的精度的精度的精度的精度的精度的精度,通过SLLILLLLLLLLLLLL的精度, 处理,其目前的精度的精确的精度的精度的精度的精度的精度则则在最后级的精度的精度的精度的精度的精度的精度的精度的精度的精度,通过SLLI的精度的精度的精度的精度的精度, 。