Learning multiple tasks sequentially without forgetting previous knowledge, called Continual Learning(CL), remains a long-standing challenge for neural networks. Most existing methods rely on additional network capacity or data replay. In contrast, we introduce a novel approach which we refer to as Recursive Gradient Optimization(RGO). RGO is composed of an iteratively updated optimizer that modifies the gradient to minimize forgetting without data replay and a virtual Feature Encoding Layer(FEL) that represents different long-term structures with only task descriptors. Experiments demonstrate that RGO has significantly better performance on popular continual classification benchmarks when compared to the baselines and achieves new state-of-the-art performance on 20-split-CIFAR100(82.22%) and 20-split-miniImageNet(72.63%). With higher average accuracy than Single-Task Learning(STL), this method is flexible and reliable to provide continual learning capabilities for learning models that rely on gradient descent.
翻译:连续学习(CL)是神经网络的长期挑战。大多数现有方法依赖于额外的网络能力或数据重放。相比之下,我们引入了一种新颖的方法,我们称之为累进梯度优化(RGO) 。 RGO由迭代更新的优化器组成,该优化器对梯度进行修改,以尽量减少忘记而不重放数据,而虚拟地标码层(FEL)代表着不同的长期结构,只有任务描述符。 实验显示,RGO与基线相比,在流行性持续分类基准方面表现得要好得多,在20-split-CIFAR100(82.22%)和20split-minimImageNet(72.63%)上实现了新的最新业绩。这种方法平均精度高于单塔斯克学习(STL),因此具有灵活性和可靠性,可以提供持续学习依赖梯系的模型的能力。