Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available. To address this shortcoming, we propose STraTA, which stands for Self-Training with Task Augmentation, an approach that builds on two key ideas for effective leverage of unlabeled data. First, STraTA uses task augmentation, a novel technique that synthesizes a large amount of data for auxiliary-task fine-tuning from target-task unlabeled texts. Second, STraTA performs self-training by further fine-tuning the strong base model created by task augmentation on a broad distribution of pseudo-labeled data. Our experiments demonstrate that STraTA can substantially improve sample efficiency across 12 few-shot benchmarks. Remarkably, on the SST-2 sentiment dataset, STraTA, with only 8 training examples per class, achieves comparable results to standard fine-tuning with 67K training examples. Our analyses reveal that task augmentation and self-training are both complementary and independently effective.
翻译:尽管在应对许多非LP任务方面最近取得了成功,大规模预先培训的语言模式在少数情况下没有发挥同样的作用,因为那里只有为数不多的培训实例。为了解决这一缺陷,我们提议STraTA,它代表了与任务增强有关的自我培训,这是一种基于有效利用未贴标签数据的两个关键想法的方法。首先,STraTA使用任务增强,这是一种从目标任务未贴标签文本中合成大量辅助任务微调数据的新技术。第二,STraTA进行自我培训,进一步微调扩大任务在广泛分发伪标签数据方面创建的强大基础模型。我们的实验表明STraTA可以大幅提高抽样效率,跨越12个少发基准。值得注意的是,在SST-2情绪数据集上,STraTA每班只有8个培训实例,取得了与标准微调相当的结果,有67K培训实例。我们的分析表明,任务增强和自我培训是互补和独立有效的。