Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT
翻译:知识蒸馏(KD)是将知识从大型“教师”网络向小型“学生”网络转移的一种有效方法。传统的KD方法需要大量贴标签的培训样本和白箱教师(可访问参数)来培训好学生。然而,这些资源并不总是在现实世界的应用中可以得到。蒸馏过程经常发生在我们无法获取大量数据的外部方,而教师由于安全和隐私方面的考虑而没有披露其参数。为了克服这些挑战,我们提议了一个黑箱少发的点数KD方法,用很少贴标签的培训样本和一个黑盒教师来培训学生。我们的主要想法是扩大培训范围,利用MixUp和一个有条件的变换自动电码生成一套不同的分配合成图像。这些合成图像以及从教师那里获得的标签被用来培训学生。我们进行了广泛的实验,以显示我们的方法在图像分类任务上大大超越了最近的SATA数/zero-shot KD方法。代码和模型可以在 https://gis/ngubth/BFS/BS/BSDT上找到。