Continual learning (CL) aims to learn a sequence of tasks over time, with data distributions shifting from one task to another. When training on new task data, data representations from old tasks may drift. Some negative representation drift can result in catastrophic forgetting, by causing the locally learned class prototypes and data representations to correlate poorly across tasks. To mitigate such representation drift, we propose a method that finds global prototypes to guide the learning, and learns data representations with the regularization of the self-supervised information. Specifically, for NLP tasks, we formulate each task in a masked language modeling style, and learn the task via a neighbor attention mechanism over a pre-trained language model. Experimental results show that our proposed method can learn fairly consistent representations with less representation drift, and significantly reduce catastrophic forgetting in CL without resampling data from past tasks.
翻译:----
连续学习(CL)旨在随着数据分布从一个任务转移到另一个任务的时间,学习一系列任务。当训练新任务数据时,旧任务数据的数据表征可能会漂移。一些消极的表征漂移可能会导致灾难性遗忘,因为会导致本地学习到的类原型和数据表征在任务之间关联很差。为了缓解这种表征漂移,我们提出了一种方法,找到全局原型来指导学习,并使用自监督信息的正则化来学习数据表征。具体地,在NLP任务中,我们以掩码语言建模的方式构造每个任务,并通过一个预训练语言模型上的邻居注意机制来学习任务。实验结果表明,我们的方法可以学习出较为一致的表征,并且在CL中显著减少灾难性遗忘而不需要重新对过去的任务数据进行取样。