Knowledge distillation (KD) has been actively studied for image classification tasks in deep learning, aiming to improve the performance of a student based on the knowledge from a teacher. However, applying KD in image regression with a scalar response variable has been rarely studied, and there exists no KD method applicable to both classification and regression tasks yet. Moreover, existing KD methods often require a practitioner to carefully select or adjust the teacher and student architectures, making these methods less flexible in practice. To address the above problems in a unified way, we propose a comprehensive KD framework based on cGANs, termed cGAN-KD. Fundamentally different from existing KD methods, cGAN-KD distills and transfers knowledge from a teacher model to a student model via cGAN-generated samples. This novel mechanism makes cGAN-KD suitable for both classification and regression tasks, compatible with other KD methods, and insensitive to the teacher and student architectures. An error bound for a student model trained in the cGAN-KD framework is derived in this work, providing a theory for why cGAN-KD is effective as well as guiding the practical implementation of cGAN-KD. Extensive experiments on CIFAR-100 and ImageNet-100 show that we can combine state of the art KD methods with the cGAN-KD framework to yield a new state of the art. Moreover, experiments on Steering Angle and UTKFace demonstrate the effectiveness of cGAN-KD in image regression tasks, where existing KD methods are inapplicable.
翻译:为了在深层学习中进行图像分类任务,对知识蒸馏(KD)进行了积极研究,目的是根据教师的知识提高学生的绩效。然而,很少研究在图像回归中应用卡路里响应变数的KD,目前也没有适用于分类和回归任务的KD方法。此外,现有的KD方法往往要求执业者仔细选择或调整教师和学生结构,使这些方法在实践中不那么灵活。为了以统一的方式解决上述问题,我们提议了一个基于CGANs(称为cGAN-KD)的综合KD框架。它与现有的KD方法截然不同。CGAN-KD蒸馏和通过CGAN生成的样本将知识从教师模型转移到学生模型。这个新机制使cGAN-KD适合分类和回归任务,与其他KD方法兼容,对教师和学生结构不敏感。一个在CGAAN-KD国家框架内培训的学生模式的错误,为CGAN-D的CAN-D 和KKKD的CAN-K-KLULA 提供了一个理论,用来指导CAN-K-KK-LOILULULA的C-ILA和KKAULULA的实践方法,可以用来指导新的C-K-K-K-G-ILULULA-ILULAUD的实验。