Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge, even in the absence of a powerful but static teacher network. Motivated by these findings, we propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance. Furthermore, an online distillation strategy is utilized to train the teacher and students simultaneously. To evaluate the performance of the proposed approach, extensive experiments were conducted using three different versions of teacher-student networks on benchmark biomedical classification (MSI vs. MSS) and object detection (Polyp Detection) tasks. Ensemble of student networks trained in the proposed manner achieved better results than the ensemble of students trained using KD or ML individually, establishing the benefit of augmenting knowledge transfer from teacher to students with peer-to-peer learning between students.
翻译:知识蒸馏是一种有效的模型压缩技术(KD),在这种技术中,一个紧凑的学生网络被教导模仿一个复杂和训练有素的教师网络的行为;相反,相互学习(ML)提供了一种替代战略,使多个简单的学生网络受益于知识分享,即使没有强大但静态的教师网络。我们根据这些发现,提议了一个单一教师、多学生框架,利用KD和ML的优势取得更好的业绩;此外,利用在线蒸馏战略同时培训教师和学生;为评价拟议方法的绩效,利用三种不同版本的师生网络进行广泛的实验,即生物医学分类基准(MSI诉MSS)和对象探测(Pollyp探测)任务。以拟议方式培训的学生网络,比利用KD或ML单独培训的学生群取得更好的成果,建立将知识从教师向学生相互学习的同侪转让的好处。