The performance of a distillation-based compressed network is governed by the quality of distillation. The reason for the suboptimal distillation of a large network (teacher) to a smaller network (student) is largely attributed to the gap in the learning capacities of given teacher-student pair. While it is hard to distill all the knowledge of a teacher, the quality of distillation can be controlled to a large extent to achieve better performance. Our experiments show that the quality of distillation is largely governed by the quality of teacher's response, which in turn is heavily affected by the presence of similarity information in its response. A well-trained large capacity teacher loses similarity information between classes in the process of learning fine-grained discriminative properties for classification. The absence of similarity information causes the distillation process to be reduced from one example-many class learning to one example-one class learning, thereby throttling the flow of diverse knowledge from the teacher. With the implicit assumption that only the instilled knowledge can be distilled, instead of focusing only on the knowledge distilling process, we scrutinize the knowledge inculcation process. We argue that for a given teacher-student pair, the quality of distillation can be improved by finding the sweet spot between batch size and number of epochs while training the teacher. We discuss the steps to find this sweet spot for better distillation. We also propose the distillation hypothesis to differentiate the behavior of the distillation process between knowledge distillation and regularization effect. We conduct all our experiments on three different datasets.
翻译:以蒸馏为主的压缩网络的性能受蒸馏质量的制约。 蒸馏质量主要由蒸馏质量决定。 大网络(教师)向较小网络(学生)的次优化蒸馏原因主要归因于特定教师-学生配对的学习能力差距。 虽然蒸馏质量很难蒸馏教师的所有知识,但蒸馏质量在很大程度上可以控制,以取得更好的业绩。我们的实验表明,蒸馏质量主要取决于教师反应的质量,而这反过来又受到其反应中类似信息的影响。 受过良好训练的大型能力教师在学习细化歧视性叙级的过程中失去了各班级之间的类似信息。 缺乏类似信息导致蒸馏过程从一个示范性班学习到一个例性班级学习,从而加速教师多样化知识的流动。 我们暗地假设,只有灌输性知识才能蒸馏,而不是仅仅注重这种知识蒸馏过程。 我们通过改进质量研究,我们也可以通过改进质量,我们通过改进培训过程,我们只注重这种知识的蒸馏过程。