Recent advances in large pre-trained language models (PLMs) lead to impressive gains on natural language understanding (NLU) tasks with task-specific fine-tuning. However, direct fine-tuning PLMs heavily relies on a large amount of labeled instances, which are usually hard to obtain. Prompt-based tuning on PLMs has proven valuable for various few-shot tasks. Existing works studying prompt-based tuning for few-shot NLU tasks mainly focus on deriving proper label words with a verbalizer or generating prompt templates for eliciting semantics from PLMs. In addition, conventional data augmentation methods have also been verified useful for few-shot tasks. However, currently there are few data augmentation methods designed for the prompt-based tuning paradigm. Therefore, we study a new problem of data augmentation for prompt-based few shot learners. Since the label semantics are essential in prompt-based tuning, we propose a novel label-guided data augmentation method PromptDA which exploits the enriched label semantic information for data augmentation. Extensive experiment results on few-shot text classification tasks show that our proposed framework achieves superior performance by effectively leveraging label semantics and data augmentation for natural language understanding.
翻译:在经过培训的大型语言模型(PLM)方面最近的进展导致自然语言理解任务(NLU)的显著进展,这些任务有特定任务的微调。然而,直接微调PLM严重依赖大量标签式的事例,通常很难获得。基于快速的对PLM的调试已证明对各种微小任务很有价值。现有研究为微小的NLU任务进行即时调试的工作,主要侧重于用言语生成正确的标签词句,或生成迅速的模板,以从PLMS引出语义。此外,常规的数据增强方法也已被核实对少数任务有用。然而,目前为基于即时的调制模式设计的数据增强方法很少。因此,我们研究为基于即时的少数射击学习者增加数据的新问题。由于标签式调试在基于即时的调中至关重要,因此我们建议一种新型的标签制数据增强方法“PreadDA”,利用浓缩的标签式语义信息来增加数据。关于少量文本分类工作的广泛实验结果显示,我们提议的框架通过有效地利用标签式修饰和自然语言数据增强而取得优越性。