Reasoning distillation has attracted increasing attention. It typically leverages a large teacher model to generate reasoning paths, which are then used to fine-tune a student model so that it mimics the teacher's behavior in training contexts. However, previous approaches have lacked a detailed analysis of the origins of the distilled model's capabilities. It remains unclear whether the student can maintain consistent behaviors with the teacher in novel test-time contexts, or whether it regresses to its original output patterns, raising concerns about the generalization of distillation models. To analyse this question, we introduce a cross-model Reasoning Distillation Provenance Tracing framework. For each action (e.g., a sentence) produced by the distilled model, we obtain the predictive probabilities assigned by the teacher, the original student, and the distilled model under the same context. By comparing these probabilities, we classify each action into different categories. By systematically disentangling the provenance of each action, we experimentally demonstrate that, in test-time contexts, the distilled model can indeed generate teacher-originated actions, which correlate with and plausibly explain observed performance on distilled model. Building on this analysis, we further propose a teacher-guided data selection method. Unlike prior approach that rely on heuristics, our method directly compares teacher-student divergences on the training data, providing a principled selection criterion. We validate the effectiveness of our approach across multiple representative teacher models and diverse student models. The results highlight the utility of our provenance-tracing framework and underscore its promise for reasoning distillation. We hope to share Reasoning Distillation Provenance Tracing and our insights into reasoning distillation with the community.
翻译:推理蒸馏技术日益受到关注。该方法通常利用大型教师模型生成推理路径,随后使用这些路径微调学生模型,使其在训练情境中模仿教师行为。然而,先前研究缺乏对蒸馏模型能力来源的深入分析。学生模型能否在全新测试情境中保持与教师模型一致的行为,抑或会回归其原始输出模式,这一问题尚未明晰,引发了对蒸馏模型泛化能力的担忧。为探究此问题,我们提出了跨模型推理蒸馏溯源框架。针对蒸馏模型生成的每个行为单元(如语句),我们在相同语境下获取教师模型、原始学生模型及蒸馏模型分别赋予的预测概率。通过比较这些概率,我们将每个行为归类至不同溯源类别。通过系统解构每个行为的来源,我们通过实验证明:在测试情境中,蒸馏模型确实能够生成源自教师模型的行为,这些行为与蒸馏模型的观测性能相关且可合理解释其表现。基于此分析,我们进一步提出教师引导的数据选择方法。与依赖启发式规则的先前方法不同,本方法直接比较训练数据上的师生差异,提供理论化的选择标准。我们在多个代表性教师模型与多样化学生模型中验证了本方法的有效性。实验结果凸显了溯源框架的实用价值,并印证了其在推理蒸馏领域的应用前景。我们期待与学界共享推理蒸馏溯源框架及对推理蒸馏机制的洞见。