Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings.
翻译:在许多认知任务中,神经网络取得了显著的成功。然而,当他们连续接受多重任务的培训而没有获得旧数据时,早期任务的表现往往会显著下降。这个问题常常被称为灾难性的遗忘,这是不断学习神经网络的一个关键挑战。基于正规化的方法是缓解灾难性遗忘的主要方法类别之一。在本文件中,我们提供了基于正规化的持续学习的新观点,将正规化作为每个任务损失函数的第二阶段泰勒近似值。这一观点导致形成一个统一框架,可以即时得出许多现有的算法,如Elastic Weight 聚合和Kronecker 系数拉贝近似。基于这一观点,我们研究了基于正规化的持续学习的优化方面(即趋同)和一般化特性(即有限抽样保证)。我们的理论结果表明,黑森矩阵的准确近似值非常重要。几个基准的实验结果为我们理论发现提供了经验验证。