When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction. Proponents of prompting have argued that prompts provide a method for injecting task-specific guidance, which is beneficial in low-data regimes. We aim to quantify this benefit through rigorous testing of prompts in a fair setting: comparing prompted and head-based fine-tuning in equal conditions across many tasks and data sizes. By controlling for many sources of advantage, we find that prompting does indeed provide a benefit, and that this benefit can be quantified per task. Results show that prompting is often worth 100s of data points on average across classification tasks.
翻译:当微调预先培训的分类模式时,研究人员要么使用通用模型头,要么使用特定任务快速预测。 推动者认为,提示为注入特定任务的指导提供了一种方法,这有利于低数据制度。我们的目标是通过在公平环境下严格测试提示来量化这一效益:在许多任务和数据大小的同等条件下对激励和基于头的微调进行比较。通过控制许多优势来源,我们发现催化确实提供了好处,这种好处可以按每项任务量化。结果显示,催化往往在各种分类任务中平均值100个数据点。