This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary detection, the amount of information transferred from the teacher to the student is restricted, thus limiting the utility of knowledge distillation. Performance can be improved by leveraging information of possible subclasses within the classes. To that end, we propose the so-called Subclass Knowledge Distillation (SKD), a process of transferring the knowledge of predicted subclasses from a teacher to a smaller student. Meaningful information that is not in the teacher's class logits but exists in subclass logits (e.g., similarities within classes) will be conveyed to the student through the SKD, which will then boost the student's performance. Analytically, we measure how much extra information the teacher can provide the student via the SKD to demonstrate the efficacy of our work. The framework developed is evaluated in clinical application, namely colorectal polyp binary classification. It is a practical problem with two classes and a number of subclasses per class. In this application, clinician-provided annotations are used to define subclasses based on the annotation label's variability in a curriculum style of learning. A lightweight, low-complexity student trained with the SKD framework achieves an F1-score of 85.05%, an improvement of 1.47%, and a 2.10% gain over the student that is trained with and without conventional knowledge distillation, respectively. The 2.10% F1-score gap between students trained with and without the SKD can be explained by the extra subclass knowledge, i.e., the extra 0.4656 label bits per sample that the teacher can transfer in our experiment.
翻译:这项工作为分类任务引入了一个新的知识蒸馏框架, 在分类任务中, 有关于现有小类的信息可供使用并被考虑。 在分类任务中, 有少量的班级或二进制检测, 从教师向学生传递的信息数量受到限制, 从而限制了知识蒸馏的效用。 可以通过在班级中利用可能的子类信息来提高绩效。 为此, 我们提议了所谓的子类知识蒸馏( SKD), 这是一种将预言的小类知识从教师传到一个小学生的过程。 在分类中, 不是教师的班级日志, 而是在亚级常规日志( 例如, 班级中的相似点), 将会通过 SKD 向学生传递信息, 从而提高学生的成绩。 分析说, 教师可以通过 SKDD 向学生提供多少额外信息来显示我们的工作效率。 开发的框架是在临床应用中评估的, 即 彩色标签 IPb binary 分类。 这是两个班级之间的一个实际问题, 是每个班级的子类D级, 而是存在一个小班级( 比如的相似点) 。 在SK 培训的2进级里, 一个应用中, 解释了我们学习的版本中, 一个不使用了一种小类中, 一个基础的解的缩图。