Knowledge distillation (KD) shows a bright promise as a powerful regularization strategy to boost generalization ability by leveraging learned sample-level soft targets. Yet, employing a complex pre-trained teacher network or an ensemble of peer students in existing KD is both time-consuming and computationally costly. Various self KD methods have been proposed to achieve higher distillation efficiency. However, they either require extra network architecture modification or are difficult to parallelize. To cope with these challenges, we propose an efficient and reliable self-distillation framework, named Self-Distillation from Last Mini-Batch (DLB). Specifically, we rearrange the sequential sampling by constraining half of each mini-batch coinciding with the previous iteration. Meanwhile, the rest half will coincide with the upcoming iteration. Afterwards, the former half mini-batch distills on-the-fly soft targets generated in the previous iteration. Our proposed mechanism guides the training stability and consistency, resulting in robustness to label noise. Moreover, our method is easy to implement, without taking up extra run-time memory or requiring model structure modification. Experimental results on three classification benchmarks illustrate that our approach can consistently outperform state-of-the-art self-distillation approaches with different network architectures. Additionally, our method shows strong compatibility with augmentation strategies by gaining additional performance improvement. The code is available at https://github.com/Meta-knowledge-Lab/DLB.
翻译:知识蒸馏( KD) 展示了一个光明的希望, 作为一种强大的自我蒸馏战略, 通过利用所学的样本软目标, 提高一般化能力。 然而, 使用复杂的、 受过训练的教师网络或现有KD中同龄学生的组合, 既耗时又计算成本。 提出了各种自我蒸馏方法, 以提高蒸馏效率。 但是, 它们要么需要额外的网络结构修改, 要么 难以平行。 为了应对这些挑战, 我们提出了一个高效可靠的自我蒸馏框架, 名为“ 最后一次迷你批( DLB) 自我蒸馏 ” 。 具体地说, 我们重新排列顺序采样, 限制每个小型批中一半与先前的重复同时的混合。 同时, 其余一半将和即将进行的循环同步一致。 之后, 以前的半个半个小型批蒸馏的软目标需要额外的网络结构修改。 我们提议的机制通过引导培训的稳定性和一致性, 导致贴标签的噪音。 此外, 我们的方法很容易实施, 而不使用额外的时间记忆或要求不断调整的模型结构的兼容性。 实验性方法可以显示我们现有的内部结构结构的自我更新。 。 实验性结构的三级 。 实验性 能够显示我们现有的内部结构的自我升级 。