Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidence indicating that the adaptations of PLMs to various few-shot tasks can be reparameterized as optimizing only a few free parameters in a unified low-dimensional intrinsic task subspace, which may help us understand why PLMs could easily adapt to various NLP tasks with small-scale data. To find such a subspace and examine its universality, we propose an analysis pipeline called intrinsic prompt tuning (IPT). Specifically, we resort to the recent success of prompt tuning and decompose the soft prompts of multiple NLP tasks into the same low-dimensional nonlinear subspace, then we learn to adapt the PLM to unseen data or tasks by only tuning parameters in this subspace. In the experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 250-dimensional subspace found with 100 tasks, by only tuning 250 free parameters, we can recover 97% and 83% of the full prompt tuning performance for 100 seen tasks (using different training data) and 20 unseen tasks, respectively, showing great generalization ability of the found intrinsic task subspace. Besides being an analysis tool, IPT could further bring practical benefits, such as improving the prompt tuning stability.
翻译:培训前语言模型(PLM)为何能学习通用的表达方式,并有效地适应广泛的NLP任务? 在这项工作中,我们从经验中发现有证据表明,PLM任务适应不同的微小任务,只能通过在统一的低维内在任务分空里优化一些自由参数来进行再校准,这可能有助于我们理解为什么PLM可以轻松地适应使用小型数据的各种NLP任务。为了找到这样一个子空间并检查其普遍性,我们提议了一个称为内部快速调整(IPT)的分析管道。具体地说,我们利用最近迅速调整和将多个NLP任务的软提示分解到同样的低维非线性非线性分空里的成功,然后我们学会将PLM调整为未知数据或任务,仅通过调校准这个子空间中的参数。我们在实验中研究了微小的NLP任务,并令人惊讶地发现,在一个有100项任务的250维的子空间分空域里,只调250个自由参数(IPT),我们可以进一步恢复97%和83%的快速调整全局性性工作,以带来100项的实际能力(使用不同的培训数据,同时展示),可以分别显示巨大的实时分析。