Large pretrained language models (LMs) like BERT have improved performance in many disparate natural language processing (NLP) tasks. However, fine tuning such models requires a large number of training examples for each target task. Simultaneously, many realistic NLP problems are "few shot", without a sufficiently large training set. In this work, we propose a novel conditional neural process-based approach for few-shot text classification that learns to transfer from other diverse tasks with rich annotation. Our key idea is to represent each task using gradient information from a base model and to train an adaptation network that modulates a text classifier conditioned on the task representation. While previous task-aware few-shot learners represent tasks by input encoding, our novel task representation is more powerful, as the gradient captures input-output relationships of a task. Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.
翻译:BERT等大型预先培训语言模型(LMs)在许多不同的自然语言处理(NLP)任务中提高了绩效。然而,微调这些模型需要为每个目标任务提供大量培训实例。同时,许多现实的NLP问题“淡出 ”, 而没有足够大的培训组。在这项工作中,我们提议了一种新的有条件的神经过程方法,用于从其他不同任务中学习,并具有丰富的注释性能的微粒文本分类。我们的关键想法是使用基模中的梯度信息来代表每一项任务,并培训一个调整以任务代表为条件的文本分类器的适应网络。虽然以前任务认识不多的学生代表了输入编码的任务,但我们的新任务代表了更大的力量,因为梯度捕捉了一项任务的输入输出-输出关系。实验结果显示,我们的方法超越了传统的微调、顺序传输学习和在收集不同微分数任务时采用最先进的元学习方法。我们进一步进行了分析和划线,以证明我们的设计选择是正确的。