This paper introduces a novel dynamic knowledge distillation framework, Gompertz-CNN, which integrates the Gompertz growth model into the training process to address the limitations of traditional knowledge distillation. Conventional methods often fail to capture the evolving cognitive capacity of student models, leading to suboptimal knowledge transfer. To overcome this, we propose a stage-aware distillation strategy that dynamically adjusts the weight of distillation loss based on the Gompertz curve, reflecting the student's learning progression: slow initial growth, rapid mid-phase improvement, and late-stage saturation. Our framework incorporates Wasserstein distance to measure feature-level discrepancies and gradient matching to align backward propagation behaviors between teacher and student models. These components are unified under a multi-loss objective, where the Gompertz curve modulates the influence of distillation losses over time. Extensive experiments on CIFAR-10 and CIFAR-100 using various teacher-student architectures (e.g., ResNet50 and MobileNet_v2) demonstrate that Gompertz-CNN consistently outperforms traditional distillation methods, achieving up to 8% and 4% accuracy gains on CIFAR-10 and CIFAR-100, respectively.
翻译:本文提出了一种新颖的动态知识蒸馏框架Gompertz-CNN,该框架将Gompertz增长模型集成到训练过程中,以解决传统知识蒸馏的局限性。传统方法通常无法捕捉学生模型不断演化的认知能力,导致知识迁移效果欠佳。为克服此问题,我们提出一种阶段感知的蒸馏策略,该策略基于Gompertz曲线动态调整蒸馏损失的权重,以反映学生的学习进程:初始缓慢增长、中期快速提升和后期趋于饱和。我们的框架采用Wasserstein距离来度量特征层面的差异,并通过梯度匹配来对齐教师模型与学生模型的反向传播行为。这些组件在一个多损失目标下统一起来,其中Gompertz曲线随时间调节蒸馏损失的影响。在CIFAR-10和CIFAR-100数据集上,使用多种师生架构(如ResNet50和MobileNet_v2)进行的大量实验表明,Gompertz-CNN始终优于传统蒸馏方法,在CIFAR-10和CIFAR-100上分别实现了高达8%和4%的准确率提升。