大规模模型的跨任务知识蒸馏 (Prototype-guided Cross-task Knowledge Distillation for Large-scale Models)

Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to the huge computational complexity and storage requirements, it is challenging to apply the large-scale model to real scenes. A common solution is knowledge distillation which regards the large-scale model as a teacher model and helps to train a small student model to obtain a competitive performance. Cross-task Knowledge distillation expands the application scenarios of the large-scale pre-trained model. Existing knowledge distillation works focus on directly mimicking the final prediction or the intermediate layers of the teacher model, which represent the global-level characteristics and are task-specific. To alleviate the constraint of different label spaces, capturing invariant intrinsic local object characteristics (such as the shape characteristics of the leg and tail of the cattle and horse) plays a key role. Considering the complexity and variability of real scene tasks, we propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios. First, to better transfer the generalized knowledge in the teacher model in cross-task scenarios, we propose a prototype learning module to learn from the essential feature representation of objects in the teacher model. Secondly, for diverse downstream tasks, we propose a task-adaptive feature augmentation module to enhance the features of the student model with the learned generalization prototype features and guide the training of the student model to improve its generalization ability. The experimental results on various visual tasks demonstrate the effectiveness of our approach for large-scale model cross-task knowledge distillation scenes.

翻译：最近,大规模的预培训模式在许多任务中显示出其优势,然而,由于计算复杂程度和储存要求巨大,将大规模模型应用到真实场景具有挑战性。一个共同的解决方案是知识蒸馏,将大规模模型视为教师模型,帮助培训小型学生模型以取得竞争性业绩。跨任务知识蒸馏扩大了大规模预培训模式的应用情景。现有的知识蒸馏工作侧重于直接模仿教师模型的最后预测或中间层,它们代表着全球层面的特点和任务特异性。为了减轻不同直观标签空间的制约,捕捉各种变异性本地目标特征(如牛马腿和马尾部的形状特征)起着关键作用。考虑到真实任务的复杂性和变异性,我们建议采用一种Prototy-型指导跨任务组合知识蒸馏(ProC-KD)方法,将我们大规模教师网络的内在本地级知识传授到各种任务情景。首先,为了更好地将不同直观的直观功能定位空间域域域域域域域域域域域域域域域域功能的制约,捕捉到各种任务模型的通用模型中,我们提出在跨任务模型模型模型中进行基础的跨任务模式示范化任务展示。我们提议在教师模型中进行基础的学习,将一个基本的教学模型的模型的模型示范模式示范式示范式示范式示范性任务展示,将一个基础的学习模型,将一个基础的示范式的模型,将示范式示范式示范式的示范式示范式的示范式的示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式的示范式,将示范式,将示范式示范式示范式示范式示范式,将示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式的示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式示范式