To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.
翻译:为了提高性能,深神经网络需要更深或更广的网络结构,其中涉及大量的计算和记忆成本。为了缓解这一问题,自学蒸馏方法通过提炼模型本身的内部知识使模型规范化。常规自学蒸馏方法需要额外的可培训参数,或取决于数据。在本文中,我们提议了一种简单而有效的自学蒸馏方法,使用一种辍学(SD-Dropout)方法。SD-Dropout通过抽取放弃抽样,将多种模型的外表分布蒸馏出来。我们的方法不需要任何额外的可训练模块,不依赖数据,而只需要简单的操作。此外,这一简单方法可以很容易地与各种自学蒸馏方法相结合。我们提供了对我们工作中前方和反向KL-维力效应的理论和实验分析。对各种视觉任务进行广泛的实验,即图像分类、对象探测和分布转换,表明拟议的方法能够有效地改进单一网络的普及性能。进一步实验表明,拟议的方法还可以提高校准性能、稳健性和超越性能。