Knowledge distillation has made remarkable achievements in model compression. However, most existing methods demand original training data, while real data in practice are often unavailable due to privacy, security and transmission limitation. To address this problem, we propose a conditional generative data-free knowledge distillation (CGDD) framework to train efficient portable network without any real data. In this framework, except using the knowledge extracted from teacher model, we introduce preset labels as additional auxiliary information to train the generator. Then, the trained generator can produce meaningful training samples of specified category as required. In order to promote distillation process, except using conventional distillation loss, we treat preset label as ground truth label so that student network is directly supervised by the category of synthetic training sample. Moreover, we force student network to mimic the attention maps of teacher model and further improve its performance. To verify the superiority of our method, we design a new evaluation metric is called as relative accuracy to directly compare the effectiveness of different distillation methods. Trained portable network learned with proposed data-free distillation method obtains 99.63%, 99.07% and 99.84% relative accuracy on CIFAR10, CIFAR100 and Caltech101, respectively. The experimental results demonstrate the superiority of proposed method.
翻译:在模型压缩方面,知识蒸馏取得了显著的成就。然而,大多数现有方法要求原始培训数据,而实际中的实际数据往往由于隐私、安全和传输限制而得不到。为了解决这个问题,我们提议一个有条件的无基因数据蒸馏(CGDD)框架,在没有任何真实数据的情况下培训高效的便携式网络。在这个框架内,除了使用从教师模型中提取的知识外,我们采用预设标签作为额外辅助信息来培训发电机。然后,经过培训的发电机可以按要求生成有意义的特定类别的培训样本。为了促进蒸馏过程,除了使用传统的蒸馏损失之外,我们把预设标签作为地面真理标签,以便学生网络直接由合成培训样本类别监督。此外,我们迫使学生网络模拟教师模型的注意图,并进一步改进其绩效。为了核实我们的方法的优越性,我们设计新的评价指标,作为直接比较不同蒸馏方法的有效性的相对精确性。用拟议的无数据蒸馏方法培训的便携式网络获得了99.63%、99.07 %和99.84%的相对精确性。拟议的CIFAR10和CAR100技术实验性结果。