Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning (SSL). The prediction uncertainty is typically expressed as the \emph{entropy} computed by the transformed probabilities in output space. Most existing works distill low-entropy prediction by either accepting the determining class (with the largest probability) as the true label or suppressing subtle predictions (with the smaller probabilities). Unarguably, these distillation strategies are usually heuristic and less informative for model training. From this discernment, this paper proposes a dual mechanism, named ADaptive Sharpening (\ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions, and then seamlessly sharpens the informed predictions, distilling certain predictions with the informed ones only. More importantly, we theoretically analyze the traits of \ADS by comparing with various distillation strategies. Numerous experiments verify that \ADS significantly improves the state-of-the-art SSL methods by making it a plug-in. Our proposed \ADS forges a cornerstone for future distillation-based SSL research.
翻译:尽量减少未贴标签数据的预测不确定性是半监督学习(SSL)中取得良好业绩的一个关键因素。预测不确定性通常表现为由产出空间的变异概率计算出的 emph{entropy}。大多数现有作品通过接受确定等级(最大概率)作为真实标签或抑制微妙预测(与较小概率相比)来蒸发低脂预测,从而蒸发低脂预测。这些蒸发战略通常过于疲软,对模型培训则信息量较少。根据这一发现,本文提出了一个称为Apptive 锐化(ADS)的双重机制,首先将软阈值运用于适应性遮盖排除确定值和微量预测,然后无缝地强化知情预测,将某些预测与仅与知情预测相提炼。更重要的是,我们通过与各种蒸馏战略进行比较从理论上分析国家数据库的特性。许多实验证实,国家数据库通过将SDSL作为未来研究的基础,大大改进国家数据库的状态方法。