Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR
翻译:持续学习(CL)学习一系列任务,目标是实现两个主要目标:克服灾难性的忘记(CF)和鼓励跨任务的知识转让(KT),但是,大多数现有技术只侧重于克服CF,没有鼓励KT的机制,因此在KT中并不很好。虽然有几份文件试图同时处理CF和KT,但我们的实验表明,当任务没有多少共同的知识时,他们遭受严重的CF。另一个观察是,目前CL方法大多没有使用预先培训的模式,但已经证明这种模式可以大大改善任务的最后绩效。例如,在自然语言处理中,微调类似于BERT的预先培训语言模式是最有效的方法之一。然而,对于CL来说,这种办法有严重的CF。一个有趣的问题是,如何最好地利用事先培训的模式来解决CL。本文提出了一个称为CTR的新模式,以解决这些问题。我们的实验结果表明CTR的有效性。