This paper aims to provide a selective survey about knowledge distillation(KD) framework for researchers and practitioners to take advantage of it for developing new optimized models in the deep neural network field. To this end, we give a brief overview of knowledge distillation and some related works including learning using privileged information(LUPI) and generalized distillation(GD). Even though knowledge distillation based on the teacher-student architecture was initially devised as a model compression technique, it has found versatile applications over various frameworks. In this paper, we review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss,teacher-student paradigm, and the distillation process. In addition, we survey the versatility of the knowledge distillation by studying its direct applications and its usage in combination with other deep learning paradigms. Finally we present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.
翻译:本文旨在对研究人员和从业人员的知识蒸馏(KD)框架进行选择性调查,以便他们利用这一框架在深神经网络领域开发新的优化模型。为此,我们简要概述知识蒸馏和一些相关工作,包括利用特许信息(LUPI)和普遍蒸馏(GD)学习。尽管最初设计以师生结构为基础的知识蒸馏为模型压缩技术,但它发现各种框架有多种用途的应用。在本文中,我们审查了知识蒸馏的特征,其假设是,知识蒸馏的三个重要成分是蒸馏知识和损失、师范范以及蒸馏过程。此外,我们通过研究直接应用及其使用与其他深层学习模式相结合,调查了知识蒸馏的多功能性。最后,我们介绍了一些未来的知识蒸馏工作,包括可解释的知识蒸馏,其中研究了对业绩收益的分析分析,以及作为深学习社区热门研究专题的自封学习学习。