Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.
翻译:单向PLM(例如,GPT)因其高文本生成能力而广为人知;双向PLM(例如,BERT)是自然语言理解(NLU)任务的主要选择;虽然这两种模式都取得了有希望的少见学习成绩,但其零发学习的潜力没有得到充分探讨;在本文件中,我们提出了一个简单的方法,即使用两种PLM(例如,GPT)全面零射学习NLU任务,而不需要任何具体任务的数据:单向PLM产生由提示指导的有等级限制的文本;双向PLM(例如,BERT)是自然语言理解(NLU)任务的主要选择;虽然这两种模式都取得了有希望的少见的学习成绩,但其零发学习的潜力没有得到充分探讨;在微调阶段,为了更好地概括和稳定,我们的方法表明,GLUE基准的七个分类任务(例如,72.3/73.8)在提示下生成了以提示为指导的有等级限制的文本,用作精准调整双向PMLM-M/mm/92的精准方法。