We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/google-research/FLAN/tree/main/flan/v2.
翻译:我们研究了公开可提供的指示调试方法的设计决定,并打破了Flan 2022(Chung等人,2022)的开发。我们通过仔细研究Flan 任务和方法汇编的通缩研究,将设计决定的效果分开,使Flan-T5能够在评估的各个环境中比先前的工作超过3-17 ⁇ ;我们发现,任务平衡和浓缩技术被忽视,但对于有效的指导调试至关重要,特别是,与混合的快速环境(零发、微发和思维链)的培训实际上在所有环境中都产生更强的性能(2 ⁇ )。在进一步的实验中,我们显示Flan-T5要求比T5在单个下游任务上更快地调整,鼓励指示调整模型作为更符合计算效率的新任务的起始检查站。最后,为了加快对指示调控调的研究,我们将在https://github.com/google-research/FLAN/tree/main/flan/v2上公开提供2022数据集、模板和方法的收集。我们将在https://githhub.com/gole/gole/ reearch/FLAN/flan/flan/v2上公布。