We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting. While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impact on performance with an extremely large number of tasks. Our results show that task scaling can substantially improve training efficiency by 30 times in FLOPs. Moreover, we present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements. Empirically, ZeroPrompt substantially improves both the efficiency and the performance of zero-shot learning across a variety of academic and production datasets.
翻译:我们建议采用多任务前培训前办法零点Prompt, 侧重于任务缩放和零点推敲。 虽然以前的模型只接受过几十项任务的培训, 但我们第一次使用真实世界的数据将任务缩放到1,000项任务。 这导致一个至关重要的发现,即任务缩放可以成为模型缩放的有效替代办法; 也就是说, 模型大小对执行数量极多的任务的绩效影响不大。 我们的结果表明,任务缩放可以大大提高FLOP培训效率30倍。 此外,我们提出了一个快速的方法,包括基因算法,以自动寻找最迅速的无形任务,以及其他一些改进。 随机, ZeroPrompt 大大提高了各种学术和生产数据集的零点学习效率和绩效。