The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.
翻译:利用知识蒸馏(KD)无意中窃取私人/敏感信息的做法最近引起了人们的极大关注,并指导了随后的国防努力,考虑到其关键性质。最近,Nasty教师提议开发教师,但这种培训模式无法蒸馏或模仿。然而,对一个讨厌的教师提供的保密承诺研究不够周密,作为加强这些漏洞的进一步步骤,我们试图绕过其辩护和偷窃(或抽取)信息,并成功地在场。具体地说,我们从两个不同方向分析Nasty教师,然后仔细利用他们来开发简单而有效的方法,即HTC和SCM, 这种方法在标准数据集中将纳斯蒂教师的学习提高到68.63%。此外,我们还根据我们对偷窃的洞察,探索了一种简易的防御方法。我们关于不同模式/设置的详细实验和推理证明了我们的方法的有效性。