Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a strong learner like GPT-4. We consider student and teacher that are random feature models, described by two-layer networks with a random and fixed bottom layer and a trained top layer. A "weak" teacher, with a small number of units (i.e. random features), is trained on the population, and a "strong" student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. Importantly, we also show the quantitative limits of weak-to-strong generalization in this model.
翻译:弱到强泛化(Burns等人,2024)是指强学生模型(如GPT-4)从弱教师模型(如GPT-2)学习任务后,性能显著超越教师的现象。本文证明该现象并不需要GPT-4这样的强学习者。我们采用随机特征模型作为师生架构:该模型由两层网络构成,底层为随机固定层,顶层为可训练层。首先训练一个具有少量单元(即随机特征)的“弱”教师模型于总体数据上,随后训练一个具有更多单元(即随机特征)的“强”学生模型,且仅使用弱教师生成的标签进行训练。我们通过理论证明与实验验证,阐明了学生模型如何仅通过教师标注的数据实现性能超越,并解释了早期停止机制如何促成这种弱到强泛化。重要的是,我们还量化揭示了该模型中弱到强泛化的性能边界。