The ability to continuously expand knowledge over time and utilize it to rapidly generalize to new tasks is a key feature of human linguistic intelligence. Existing models that pursue rapid generalization to new tasks (e.g., few-shot learning methods), however, are mostly trained in a single shot on fixed datasets, unable to dynamically expand their knowledge; while continual learning algorithms are not specifically designed for rapid generalization. We present a new learning setup, Continual Learning of Few-Shot Learners (CLIF), to address the challenges of both learning settings in a unified setup. CLIF assumes a model learns from a sequence of diverse NLP tasks arriving sequentially, accumulating knowledge for improved generalization to new tasks, while also retaining performance on the tasks learned earlier. We examine how the generalization ability is affected in the continual learning setup, evaluate a number of continual learning algorithms, and propose a novel regularized adapter generation approach. We find that catastrophic forgetting affects generalization ability to a less degree than performance on seen tasks; while continual learning algorithms can still bring considerable benefit to the generalization ability.
翻译:长期不断扩大知识并利用知识迅速推广到新任务的能力是人类语言智慧的一个关键特征。但是,目前那些快速推广到新任务的模式(例如,微小的学习方法)大多在固定数据集上只受过一次培训,无法动态地扩大其知识;而持续学习算法并不是专门为快速普及而设计的。我们提出了一个新的学习设置,即“少数运动学习者持续学习”(CLIF),以在统一的设置中应对两个学习环境的挑战。CLIF假设一种模式,从一系列不同的NLP任务中学习,按顺序到来,积累知识,以改进对新任务的普遍化,同时保留早先所学到的任务的绩效。我们研究在持续学习设置中如何影响一般化能力,评估一些持续学习算法,并提出一种新的正规化的适应生成方法。我们发现,灾难性的遗忘影响一般化能力的程度比所看到的任务的绩效要小;而持续学习算法仍然能够给普遍化能力带来相当大的好处。