Knowledge distillation has been widely adopted in a variety of tasks and has achieved remarkable successes. Since its inception, many researchers have been intrigued by the dark knowledge hidden in the outputs of the teacher model. Recently, a study has demonstrated that knowledge distillation and label smoothing can be unified as learning from soft labels. Consequently, how to measure the effectiveness of the soft labels becomes an important question. Most existing theories have stringent constraints on the teacher model or data distribution, and many assumptions imply that the soft labels are close to the ground-truth labels. This paper studies whether biased soft labels are still effective. We present two more comprehensive indicators to measure the effectiveness of such soft labels. Based on the two indicators, we give sufficient conditions to ensure biased soft label based learners are classifier-consistent and ERM learnable. The theory is applied to three weakly-supervised frameworks. Experimental results validate that biased soft labels can also teach good students, which corroborates the soundness of the theory.
翻译:各种任务中广泛采用知识蒸馏方法,并取得了显著的成功。许多研究人员从一开始就对教师模型产出中隐藏的暗淡知识感兴趣。最近,一项研究显示,知识蒸馏和标签平滑可以随着从软标签中学习而统一。因此,如何测量软标签的有效性成为一个重要问题。大多数现有的理论都对教师模型或数据分布有严格的限制,许多假设都暗示软标签接近地面真实标签。本文研究的是,有偏见的软标签是否仍然有效。我们提出了两个更全面的指标来衡量这种软标签的有效性。根据这两个指标,我们提供了充分的条件,以确保基于偏见的软标签学习者具有分类一致性和机构可学习性。该理论适用于三个受到薄弱监督的框架。实验结果证实,有偏见的软标签也可以教好学生,这证实了理论的正确性。