Recent years have witnessed dramatically improvements in the knowledge distillation, which can generate a compact student model for better efficiency while retaining the model effectiveness of the teacher model. Previous studies find that: more accurate teachers do not necessary make for better teachers due to the mismatch of abilities. In this paper, we aim to analysis the phenomenon from the perspective of model calibration. We found that the larger teacher model may be too over-confident, thus the student model cannot effectively imitate. While, after the simple model calibration of the teacher model, the size of the teacher model has a positive correlation with the performance of the student model.
翻译:近些年来,在知识提炼方面出现了显著的改善,这可以产生一个紧凑的学生模型,以提高教学效率,同时保留教师模型的模型有效性。以前的研究发现:由于能力不匹配,更精确的教师不需要为更好的教师创造更好的教师。在本文中,我们的目标是从模型校准的角度分析这一现象。我们发现,更大的教师模型可能过于自信,因此学生模型无法有效模仿。在简单模型校准教师模型之后,教师模型的规模与学生模型的性能有着正相关关系。