Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources from multiple teachers. However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other various label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. Besides, CA-MKD incorporates intermediate layers to stable the knowledge transfer process. Extensive experiments show that our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.
翻译:最初引入了知识蒸馏法,以利用单一教师模式对学生模式培训的额外监督。为了提高学生的成绩,最近的一些变异物试图利用来自多个教师的不同知识来源。然而,现有的研究主要将来自不同来源的知识综合起来,平均超过多个教师预测,或采用其他各种无标签战略,这可能在低质量教师预测的情况下误导学生。为了解决这一问题,我们提议采用“信任软件”多教师知识蒸馏法(CA-MKD),该技术在地面真实标签的帮助下,对每个教师预测的样本和可靠性进行了适应性分配,而教师预测接近于一热标签,赋予了较大的重量。此外,CA-MKD将中间层纳入到稳定知识转移过程。广泛的实验表明,我们的CA-MKD在各种教师结构中始终超越了与最先进方法相比较的所有方法。