Soft labels generated by teacher models have become a dominant paradigm for knowledge transfer and recent large-scale dataset distillation such as SRe2L, RDED, LPLD, offering richer supervision than conventional hard labels. However, we observe that when only a limited number of crops per image are used, soft labels are prone to local semantic drift: a crop may visually resemble another class, causing its soft embedding to deviate from the ground-truth semantics of the original image. This mismatch between local visual content and global semantic meaning introduces systematic errors and distribution misalignment between training and testing. In this work, we revisit the overlooked role of hard labels and show that, when appropriately integrated, they provide a powerful content-agnostic anchor to calibrate semantic drift. We theoretically characterize the emergence of drift under few soft-label supervision and demonstrate that hybridizing soft and hard labels restores alignment between visual content and semantic supervision. Building on this insight, we propose a new training paradigm, Hard Label for Alleviating Local Semantic Drift (HALD), which leverages hard labels as intermediate corrective signals while retaining the fine-grained advantages of soft labels. Extensive experiments on dataset distillation and large-scale conventional classification benchmarks validate our approach, showing consistent improvements in generalization. On ImageNet-1K, we achieve 42.7% with only 285M storage for soft labels, outperforming prior state-of-the-art LPLD by 9.0%. Our findings re-establish the importance of hard labels as a complementary tool, and call for a rethinking of their role in soft-label-dominated training.
翻译:教师模型生成的软标签已成为知识传递的主流范式,并在SRe2L、RDED、LPLD等近期大规模数据集蒸馏中得到广泛应用,其提供的监督信息比传统硬标签更为丰富。然而,我们观察到,当每张图像仅使用有限数量的裁剪区域时,软标签容易产生局部语义漂移:某个裁剪区域可能在视觉上与其他类别相似,导致其软嵌入向量偏离原始图像的语义真值。这种局部视觉内容与全局语义含义之间的不匹配会引入系统误差,并导致训练与测试间的分布失准。本研究重新审视了被忽视的硬标签作用,并证明当合理整合时,硬标签能够提供一种与内容无关的强大锚点,以校准语义漂移。我们从理论上刻画了在少量软标签监督下漂移现象的产生机制,并论证了软硬标签的混合使用能够恢复视觉内容与语义监督之间的对齐关系。基于这一洞见,我们提出了一种新的训练范式——用于缓解局部语义漂移的硬标签方法(HALD),该方法将硬标签作为中间校正信号,同时保留软标签的细粒度优势。在数据集蒸馏和大规模常规分类基准上的大量实验验证了我们的方法,显示出泛化性能的持续提升。在ImageNet-1K数据集上,我们仅使用285MB的软标签存储空间就达到了42.7%的准确率,比先前最先进的LPLD方法高出9.0%。我们的研究结果重新确立了硬标签作为补充工具的重要性,并呼吁重新思考其在软标签主导的训练范式中的角色。