In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.
翻译:近年来,深层神经网络在行业和学术界都取得了成功,特别是在计算机的视觉任务方面。深层学习之所以取得巨大成功,主要是因为其可扩缩,将大规模数据编码,并调用数十亿个模型参数。然而,在资源有限的装置(例如移动电话和嵌入装置)上部署这些繁琐的深层模型是一项挑战,不仅因为计算复杂程度高,而且储存要求大。为此目的,开发了各种模型压缩和加速技术。作为一种具有代表性的模型压缩和加速,知识蒸馏有效地从一个大型教师模型中学习了一种小型学生模型。它得到社区越来越多的关注。本文从知识类别、培训计划、师范结构、蒸馏算法、业绩比较和应用的角度对知识蒸馏进行了全面调查。此外,还简要审查了知识蒸馏方面的挑战,并提出了对未来研究的评论。