Existing knowledge distillation methods mostly focus on distillation of teacher's prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel {\em \modelname{}} ({\bf\em \shortname{})} method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions. This is accomplished by introducing a notion of cross-network logit computed through passing student's representation into teacher's classifier. Further, considering the set of seen classes as a basis for the semantic space in a combinatorial perspective, we scale \shortname{} to unseen classes for enabling effective exploitation of largely available, arbitrary unlabeled training data. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL). Extensive experiments show that our \shortname{} outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks, as well as less studied yet practically crucial binary network distillation. Under more realistic open-set SSL settings we introduce, we reveal that knowledge distillation is generally more effective than existing Out-Of-Distribution (OOD) sample detection, and our proposed \shortname{} is superior over both previous distillation and SSL competitors. The source code is available at \url{https://github.com/jingyang2017/SRD\_ossl}.
翻译:现有知识蒸馏方法主要侧重于对教师预测和中间激活进行蒸馏。 然而, 结构化的表达方式( 可以说是深模型中最关键的元素之一) 在很大程度上被忽略了。 在这项工作中, 我们提出一个新的 ~em\ modelname\\\ (\ bf\ em\ \ \ \ \\\\\\\\\\\\\\\}}} 方法, 专门用来从一个事先训练的教师到一个目标学生, 将代表性知识从一个直观的教师的预测和中间激活。 关键表达方式是, 利用教师的分类方法来评估教师和学生的表达方式, 并用高阶结构化的信息来提取语义学知识。 通过将学生的表达方式计算跨网络的逻辑化概念, 在教师的分类中, 将先前的表达式作为语义化的基础, 将前的表达式的表达式( ) 和我们之前的演化过程的演化方式, 将显示我们之前的演化过程的演化过程的演化方式, 以更深入的演化的演变的演进式SL