Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models. However, in certain situations, this technique is more of a curse than a blessing. For instance, KD poses a potential risk of exposing intellectual properties (IPs): even if a trained machine learning model is released in 'black boxes' (e.g., as executable software or APIs without open-sourcing code), it can still be replicated by KD through imitating input-output behaviors. To prevent this unwanted effect of KD, this paper introduces and investigates a concept called Nasty Teacher: a specially trained teacher network that yields nearly the same performance as a normal one, but would significantly degrade the performance of student models learned by imitating it. We propose a simple yet effective algorithm to build the nasty teacher, called self-undermining knowledge distillation. Specifically, we aim to maximize the difference between the output of the nasty teacher and a normal pre-trained network. Extensive experiments on several datasets demonstrate that our method is effective on both standard KD and data-free KD, providing the desirable KD-immunity to model owners for the first time. We hope our preliminary study can draw more awareness and interest in this new practical problem of both social and legal importance.
翻译:知识蒸馏(KD)是一种广泛使用的技术,用于将知识从经过培训的教师模型转移到(通常更轻巧的)学生模型。然而,在某些情况下,这种技术更是一种诅咒,而不是一种祝福。例如,KD带来了暴露知识产权(IPs)的潜在风险:即使一个经过训练的机器学习模型在“黑盒子”中发布(例如,作为可执行软件或没有开源代码的API),它仍然可以通过模仿输入输出行为来让KD复制。为了防止KD的这种不想要的效果,本文介绍并调查了一个名为Nasty Deacher的概念:一个经过专门训练的教师网络,其性能几乎与正常的几乎相同,但通过模仿它会大大降低学生模型的性能。我们提出了一个简单而有效的算法,用来培养讨厌的教师,称为自我挖掘知识蒸馏。具体地说,我们的目标是通过模仿进取教师的输出和一个正常的预培训网络,从而尽可能扩大质量。关于几个数据集的实验表明,我们的方法在标准KD和无实际的KD标准时间和数据所有人中都有效,我们提出了这个可取的社会问题研究。