Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific head; the model is then fine-tuned end-to-end. However, with the emergence of models like GPT-3, prompt-based fine-tuning has been proven to be a successful approach for few-shot tasks. Inspired by this work, we study discrete prompt technologies in practice. There are two issues that arise with the standard prompt approach. First, it can overfit on the prompt template. Second, it requires manual effort to formulate the downstream task as a language model problem. In this paper, we propose an improvement to prompt-based fine-tuning that addresses these two issues. We refer to our approach as DynaMaR -- Dynamic Prompt with Mask Token Representation. Results show that DynaMaR can achieve an average improvement of 10% in few-shot settings and improvement of 3.7% in data-rich settings over the standard fine-tuning approach on four e-commerce applications.
翻译:最近的研究显示,使用未经监督的方法预先培训的大型语言模型可以在许多下游任务上取得显著的绩效改进。通常,在将这些语言模型适应下游任务时,例如分类或回归任务,我们采用微调模式,将语言模型的句号表述方式投入到具体任务中;然后,该模型是经过微调的端到端。然而,随着诸如GPT-3等模型的出现,快速微调已被证明是完成微小任务的成功方法。在这项工作的启发下,我们在实践中研究离散的快速技术。标准快速方法产生了两个问题。首先,它可以过度适用快速模板。第二,它需要人工努力将下游任务发展成一种语言模型问题。在本文件中,我们建议改进基于迅速的微调方法,以解决这两个问题。我们称之为DynamaR -- -- 与Mack Token代表的动态快速方法。结果显示,DynamaMaR可以在少发式环境中平均改进10%,并在四个电子商务应用标准微调方法上改进3.7%。