This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary detection (two classes) the amount of information transferred from the teacher to the student network is restricted, thus limiting the utility of knowledge distillation. Performance can be improved by leveraging information about possible subclasses within the available classes in the classification task. To that end, we propose the so-called Subclass Knowledge Distillation (SKD) framework, which is the process of transferring the subclasses' prediction knowledge from a large teacher model into a smaller student one. Through SKD, additional meaningful information which is not in the teacher's class logits but exists in subclasses (e.g., similarities inside classes) will be conveyed to the student and boost its performance. Mathematically, we measure how many extra information bits the teacher can provide for the student via SKD framework. The framework developed is evaluated in clinical application, namely colorectal polyp binary classification. In this application, clinician-provided annotations are used to define subclasses based on the annotation label's variability in a curriculum style of learning. A lightweight, low complexity student trained with the proposed framework achieves an F1-score of 85.05%, an improvement of 2.14% and 1.49% gain over the student that trains without and with conventional knowledge distillation, respectively. These results show that the extra subclasses' knowledge (i.e., 0.4656 label bits per training sample in our experiment) can provide more information about the teacher generalization, and therefore SKD can benefit from using more information to increase the student performance.
翻译:这项工作为分类任务引入了一个新的知识蒸馏框架, 在分类任务中, 现有小类的信息可以提供并被考虑在内。 在分类任务中, 将小类教师的预测知识从一个大类教师模型转换为一个较小的学生模型。 在分类任务中, 从教师到学生网络的信息数量有限, 从而限制了知识蒸馏的效用。 可以通过利用分类任务中现有班级中可能的子类信息来提高绩效。 为此, 我们建议了所谓的子类知识蒸馏( SKD) 框架, 这是将小类教师的预测知识从一个大类教师模型转换为一个较小的学生模型的过程。 通过 SKD, 额外的有意义的信息从教师班级的日志到学生网络( 如, 校内相似性能) 。 从数学角度来说, 我们测量教师可以通过 SKD 框架为学生提供多少额外信息。 这些开发的框架是在临床应用中评估的, 即 rodeectal IP binary 分类 。 在此应用程序中, 诊所提供的注释将分别用于在教师班级日志上不包含校内校内校内校内校内校内校内记录的更多变数, 。 因此, 学习变1 。