One of the most efficient methods for model compression is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, conventional distillation approaches overlook this fact and use the same hint points as in the early studies. Therefore, we propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. Our method is applicable for any student network, once it is applied on a chosen teacher network. The proposed approach is validated in CIFAR-100 and ImageNet datasets, using various teacher-student pairs and numerous hint distillation methods. Our results show that hint points selected by our algorithm results in superior compression performance compared to state-of-the-art knowledge distillation algorithms on the same student models and datasets.
翻译:最高效的模型压缩方法之一是提示蒸馏法,即学生模型从教师模型的不同层面输入信息(提示),尽管选择提示点可以大大改变压缩性能,但传统蒸馏法忽略了这一事实,并使用了与早期研究相同的提示点。因此,我们建议采用基于集群的提示性选择方法,将教师模型的层次与若干指标相挂钩,并使用集群中心作为提示性点。我们的方法适用于任何学生网络,一旦在选定的教师网络上应用,即适用于任何学生网络。提议的方法在CIFAR-100和图像网络数据集中得到验证,使用各种师生配对和许多提示性蒸馏方法。我们的结果显示,我们的算法所选择的提示性点与同一学生模型和数据集上的最新知识蒸馏算法相比,在较高的压缩性能方面效果优于最先进的知识蒸馏算法。