Knowledge Distillation is a technique which aims to utilize dark knowledge to compress and transfer information from a vast, well-trained neural network (teacher model) to a smaller, less capable neural network (student model) with improved inference efficiency. This approach of distilling knowledge has gained popularity as a result of the prohibitively complicated nature of such cumbersome models for deployment on edge computing devices. Generally, the teacher models used to teach smaller student models are cumbersome in nature and expensive to train. To eliminate the necessity for a cumbersome teacher model completely, we propose a simple yet effective knowledge distillation framework that we termed Dynamic Rectification Knowledge Distillation (DR-KD). Our method transforms the student into its own teacher, and if the self-teacher makes wrong predictions while distilling information, the error is rectified prior to the knowledge being distilled. Specifically, the teacher targets are dynamically tweaked by the agency of ground-truth while distilling the knowledge gained from traditional training. Our proposed DR-KD performs remarkably well in the absence of a sophisticated cumbersome teacher model and achieves comparable performance to existing state-of-the-art teacher-free knowledge distillation frameworks when implemented by a low-cost dynamic mannered teacher. Our approach is all-encompassing and can be utilized for any deep neural network training that requires categorization or object recognition. DR-KD enhances the test accuracy on Tiny ImageNet by 2.65% over prominent baseline models, which is significantly better than any other knowledge distillation approach while requiring no additional training costs.
翻译:知识蒸馏是一种技术,目的是利用黑暗知识,将信息从广博、训练有素的神经网络(教师模型)压缩和传递到规模小、能力差的神经网络(学生模型),并提高其推断效率。这种精炼知识的方法由于在边缘计算装置上部署这种繁琐的模式过于复杂,因此越来越受欢迎。一般而言,用于教授较小学生模型的教师模型在性质上是累赘的,培训费用昂贵。为了完全消除烦琐教师模型的必要性,我们建议了一个简单而有效的知识蒸馏框架,我们称之为动态校正知识蒸馏(DR-KD-D)。我们的方法将学生转化成自己的教师,如果自教师在蒸馏信息时作出错误的预测,那么错误就会在知识蒸馏之前得到纠正。具体地,教师目标被地面测量机构以动态方式扭曲,同时淡化传统培训获得的知识。我们提议的DR-KD在缺乏精细的教师基线模型时表现得非常出色,在缺乏精细的精细的教师基准模型时,并且实现与现有水平更精确的测试框架相比,通过采用一种更精确的教师的升级的测试方法,可以大大地利用任何动态的模型来进行更精确的师的升级的学习的学习的模型。