Large "instruction-tuned" language models (finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations. Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model. Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT_001, which is trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT_001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.
翻译:大型的“ 教化” 语言模型( 与指示相适应 ) 展示出将零点推广为新任务的巨大能力。 然而,它们在很大程度上依赖于数量、多样性和创造性有限的人写教学数据,因此阻碍了调制模式的普遍性。 我们引入了自我教学框架,即通过从自己几代人身上穿靴而提高预先训练的语言模型教学- 执行能力的框架。 我们的管道从一种语言模型中生成教学、输入和产出样本,然后在使用它们来微化原始模型之前对它们进行精细化。 将我们的方法应用到Vanilla GPT3, 我们展示了超自然教学原模型的33%的绝对改进,这与SantGPT_ 001的性能相当,它受过私人用户数据和人文说明的培训。 为了进一步评估,我们为新任务制定了一套专家编写的指令,并通过人类评估显示,用自导3 校正以原始模型来微化。 我们将现有的公共教学数据集应用到一个大边缘, 仅留下一个完全的指令的绝对值改进了33%的版本, 将“ ” 向后方向调整一个大方向, 提供一个完整的系统前 。