Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.
翻译:最近的研究揭示了预先培训的语言模型(PLM)令人感兴趣的少见的学习能力:在微小的标签数据被微调成提示性数据时,它们可以迅速适应新的任务,而不需要大量具体任务说明。尽管它们表现良好,但大多数现有的微小方法只从仍然未通过非边际途径充分监督的小规模培训中学习。在这项工作中,我们从不同的角度研究与PLM一起的微小的微小学习能力:我们首先在少数的样品上调整一个自动递减式的PLM,然后用它作为生成器,合成大量新的培训样本,以补充原有的训练数据集。为了鼓励发电机制作有标签差异的样本,我们通过加权最大可能的方式培训它,即根据歧视性的元学习目标自动调整每个符号的重量。然后,对微小的和合成样本进行微调,使之正规化,以更好地概括和稳定。我们的方法小GLUE基准的七个分类任务的总体效果优于现有的微分数分级和增分级方法。