双面纸:面部识别蒸馏的相关事项 (CoupleFace: Relation Matters for Face Recognition Distillation)

Knowledge distillation is an effective method to improve the performance of a lightweight neural network (i.e., student model) by transferring the knowledge of a well-performed neural network (i.e., teacher model), which has been widely applied in many computer vision tasks, including face recognition. Nevertheless, the current face recognition distillation methods usually utilize the Feature Consistency Distillation (FCD) (e.g., L2 distance) on the learned embeddings extracted by the teacher and student models for each sample, which is not able to fully transfer the knowledge from the teacher to the student for face recognition. In this work, we observe that mutual relation knowledge between samples is also important to improve the discriminative ability of the learned representation of the student model, and propose an effective face recognition distillation method called CoupleFace by additionally introducing the Mutual Relation Distillation (MRD) into existing distillation framework. Specifically, in MRD, we first propose to mine the informative mutual relations, and then introduce the Relation-Aware Distillation (RAD) loss to transfer the mutual relation knowledge of the teacher model to the student model. Extensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed CoupleFace for face recognition. Moreover, based on our proposed CoupleFace, we have won the first place in the ICCV21 Masked Face Recognition Challenge (MS1M track).

翻译：然而,目前的面部识别蒸馏方法通常使用教师和学生模型为每个样本提取的轻量神经网络(即学生模型),通过转让完善神经网络(即教师模型)的知识来改善其性能的有效方法,这种知识在很多计算机愿景任务中广泛应用,包括面部识别;然而,目前的面部识别蒸馏方法通常使用教师和学生模型为每个样本提取的精明嵌嵌嵌入(即学生模型),无法将教师的知识充分传递给学生进行面对面识别。在这项工作中,我们发现样本之间的相互关系知识对于提高学习学生模型代表性的歧视性能力也很重要,并提出一种称为“组合法”的有效面部识别蒸馏方法,方法是在现有的蒸馏框架中进一步引入“相互关联蒸馏”(例如L2距离) 。具体地说,在MRD21中,我们首先建议消除信息性关系的相互关系,然后引入Relation-Awardststillation (RAD) 损失,将我们所拟的“挑战性教师模型”的相互关系知识转移到学生之间,我们提议的“双向”的“系统”数据库数据库中,我们提议的“双向”的“面面面面面部”数据库数据库数据库数据库数据库中,我们提议的“我们提议的“Sy FAA”的“Syal Degal Destration”的“我们提议的“Sup Statalment Statalment Stalment Statal”的“我们提议的“我们关于“我们”数据库”奖”数据库”数据库”的“我们关于“BA”数据库”数据库”的“我们“Bas”的“我们“Basment”奖”数据库”数据库”的“我们“B”的“S”的“B”的“BA”的“BAD”的“BAD”的“BAD”的“BA”模型”的“SD”的“BAD”的“我们“SD”的“SD”的“BA”的“BDADADADADADADADA”的“BADAD”的“S”的“S”的“S”的“S”的“S”的“BA”模式”的“SDA”模式”的“BA”的“B”的“我们的“S”