In the setting of continual learning, a network is trained on a sequence of tasks, and suffers from catastrophic forgetting. To balance plasticity and stability of network in continual learning, in this paper, we propose a novel network training algorithm called Adam-NSCL, which sequentially optimizes network parameters in the null space of previous tasks. We first propose two mathematical conditions respectively for achieving network stability and plasticity in continual learning. Based on them, the network training for sequential tasks can be simply achieved by projecting the candidate parameter update into the approximate null space of all previous tasks in the network training process, where the candidate parameter update can be generated by Adam. The approximate null space can be derived by applying singular value decomposition to the uncentered covariance matrix of all input features of previous tasks for each linear layer. For efficiency, the uncentered covariance matrix can be incrementally computed after learning each task. We also empirically verify the rationality of the approximate null space at each linear layer. We apply our approach to training networks for continual learning on benchmark datasets of CIFAR-100 and TinyImageNet, and the results suggest that the proposed approach outperforms or matches the state-ot-the-art continual learning approaches.
翻译:在持续学习的过程中,一个网络在一系列任务上接受培训,并受到灾难性的遗忘。为了在不断学习的过程中平衡网络的可塑性和稳定性,我们在本文中提议了一个叫Adam-NSCL的新颖的网络培训算法,即亚当-NSCL,它依次优化了先前任务空格中的网络参数参数参数参数。我们首先提出了实现网络稳定性和持续学习中的可塑性的两个数学条件。在此基础上,通过将候选人参数更新预测为网络培训过程中所有先前任务中大约没有的空格,从而可以实现连续任务的网络培训。在网络培训过程中,候选人参数更新可以由亚当生成。通过对每个线性层以往任务的所有输入特性的未以单值共变矩阵应用单值解析法,可以得出大致空格。为了效率,在学习每一项任务后可以逐步地计算出不偏差的共变矩阵。我们还根据经验核查了每个线性层大约空格空间的合理性。我们运用了我们的方法,对培训网络进行持续学习的方法,用以为CIRA-100和TinyImageNet提供基准数据集。结果表明,拟议的持续的方法可以超越状态或匹配。