With the success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Although some parameter-efficient tuning paradigms have been proposed to address this problem, they still require large resources to compute the gradients in the training phase. In this paper, we propose $\mathcal{Y}$-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. $\mathcal{Y}$-tuning learns dense representations for labels $\mathcal{Y}$ defined in a given task and aligns them to fixed feature representation. Without tuning the features of input text and model parameters, $\mathcal{Y}$-tuning is both parameter-efficient and training-efficient. For $\text{DeBERTa}_\text{XXL}$ with 1.6 billion parameters, $\mathcal{Y}$-tuning achieves performance more than $96\%$ of full fine-tuning on GLUE Benchmark with only $2\%$ tunable parameters and much fewer training costs.
翻译:由于大规模预先培训模式(PTMs)的成功,使PTM适应下游任务的效率如何得到极大关注,特别是对于具有数十亿参数的PTMs来说。虽然提出了一些具有参数效率的调整范式来解决这个问题,但仍然需要大量资源来计算培训阶段的梯度。在本文件中,我们提议了$mathcal{Y}-Turning,这是一个高效而有效的范式,可以使冻结的大型PTMs适应具体的下游任务。$\mathcal{Y}美元调适的学习费用,在一项特定任务中界定的标签$\mathcal{Y}的密集表示方式,并使它们与固定特征表示相匹配。在不调整投入文本和模型参数的特性的情况下,$\mathcal{Y}调整是具有参数效率和培训效率的。对于有16亿参数的$text{DeBERTA}text{{XXL}来说, $mathcal{Y}调适值是96美元以上GLUE基准全面调整的绩效,只有2美元,培训费用要低得多。