Knowledge distillation (KD) is a widely-used technique that utilizes large networks to improve the performance of compact models. Previous KD approaches usually aim to guide the student to mimic the teacher's behavior completely in the representation space. However, such one-to-one corresponding constraints may lead to inflexible knowledge transfer from the teacher to the student, especially those with low model capacities. Inspired by the ultimate goal of KD methods, we propose a novel Evaluation oriented KD method (EKD) for deep face recognition to directly reduce the performance gap between the teacher and student models during training. Specifically, we adopt the commonly used evaluation metrics in face recognition, i.e., False Positive Rate (FPR) and True Positive Rate (TPR) as the performance indicator. According to the evaluation protocol, the critical pair relations that cause the TPR and FPR difference between the teacher and student models are selected. Then, the critical relations in the student are constrained to approximate the corresponding ones in the teacher by a novel rank-based loss function, giving more flexibility to the student with low capacity. Extensive experimental results on popular benchmarks demonstrate the superiority of our EKD over state-of-the-art competitors.
翻译:知识蒸馏(KD)是一种广泛使用的技术,它利用大型网络来改进紧凑模式的绩效。先前的KD方法通常旨在指导学生在代表空间完全模仿教师的行为。然而,这种一对一的相应限制可能导致教师向学生,特别是模型能力低的学生转移不灵活的知识。受KD方法最终目标的启发,我们提议了一种新的以评价为导向的KD方法(EDD),以深入地面对面地承认,以直接缩小教师和学生模式在培训期间的绩效差距。具体地说,我们采用通用的面对面识别评价指标,即假正率和真实正正率作为业绩指标。根据评价协议,选择了导致教师和学生模式之间TR和FPR差异的关键对等关系。然后,学生的关键关系受限制,无法以新的按级计算的损失功能来估计教师的相应关系,从而给予能力低的学生更大的灵活性。在大众基准上的广泛实验结果显示了我们的EKD竞争对手高于现状。