Recent advances on large pre-trained language models (PLMs) lead impressive gains on natural language understanding (NLU) tasks with task-specific fine-tuning. However, direct fine-tuning PLMs heavily relies on large amount of labeled instances, which are expensive and time-consuming to obtain. Prompt-based tuning on PLMs has proven valuable for few shot tasks. Existing works studying prompt-based tuning for few-shot NLU mainly focus on deriving proper label words with a verbalizer or generating prompt templates for eliciting semantics from PLMs. In addition, conventional data augmentation methods have also been verified useful for few-shot tasks. However, there currently are few data augmentation methods designed for the prompt-based tuning paradigm. Therefore, we study a new problem of data augmentation for prompt-based few shot learners. Since label semantics are helpful in prompt-based tuning, we propose a novel label-guided data augmentation method PromptDA which exploits the enriched label semantic information for data augmentation. Experimental results on several few shot text classification tasks show that our proposed framework achieves superior performance by effectively leveraging label semantics and data augmentation in language understanding.
翻译:大型预先培训语言模型(PLM)的近期进展在自然语言理解(NLU)任务方面带来了令人印象深刻的进展,且具有特定任务的微调。然而,直接微调PLM在很大程度上依赖于大量标签式的事例,而这些事例费用昂贵,而且需要花费大量时间才能获得。快速对PLM的调试证明对少数的射击任务颇有价值。现有研究短片NLU的快速调试工作主要侧重于用言语生成正确的标签词句,或生成快速模板,以从PLMS获取语义学。此外,常规数据增强方法也已被核实对少数任务有用。然而,目前为快速调整模式设计的数据增强方法很少。因此,我们研究对基于即时的少数射击学习者增加数据的新问题。由于标签语义学有助于快速调试,我们提议了一个新的标签制数据增强方法 " 快速开发 ",该方法利用浓缩的标签制导语义信息来增强数据。关于少数短片文本分类任务的实验结果显示,我们提议的框架通过有效地利用语言语言定义中的语义和数据增强,实现了更高的业绩。