Unstructured pruning reduces a significant amount of weights of neural networks. However, unstructured pruning provides a sparse network with the same network architecture as the original network. On the other hand, structured pruning provides an efficient network architecture by removing channels, but the parameter reduction is not significant. In this paper, we consider transferring knowledge from unstructured pruning to a network with efficient architecture (with fewer channels). In particular, we apply the knowledge distillation (KD), where the teacher network is a sparse network (obtained from unstructured pruning), and the student network has an efficient architecture. We observe that learning from the pruned teacher is more effective than learning from the unpruned teacher. We further achieve the promising experimental results that unstructured pruning can improve the performance of knowledge distillation in general.
翻译:无结构的修剪会减少大量神经网络的重量。 然而, 无结构的修剪会提供与原始网络相同的网络结构的稀疏网络。 另一方面, 结构化的修剪会通过删除频道提供高效的网络结构, 但参数的减少并不显著。 在本文中, 我们考虑将知识从无结构的修剪会转移到一个拥有高效结构( 管道较少 ) 的网络。 特别是, 我们应用知识蒸馏会( KD), 教师网络是一个稀少的网络( 由无结构的修剪会所组成), 学生网络有一个高效的结构。 我们观察到, 从修剪剪的教师身上学习比从未修剪的教师那里学习更有效。 我们进一步实现有希望的实验结果, 不结构的修剪会改善一般知识蒸馏的绩效。