In neural networks, continual learning results in gradient interference among sequential tasks, leading to catastrophic forgetting of old tasks while learning new ones. This issue is addressed in recent methods by storing the important gradient spaces for old tasks and updating the model orthogonally during new tasks. However, such restrictive orthogonal gradient updates hamper the learning capability of the new tasks resulting in sub-optimal performance. To improve new learning while minimizing forgetting, in this paper we propose a Scaled Gradient Projection (SGP) method, where we combine the orthogonal gradient projections with scaled gradient steps along the important gradient spaces for the past tasks. The degree of gradient scaling along these spaces depends on the importance of the bases spanning them. We propose an efficient method for computing and accumulating importance of these bases using the singular value decomposition of the input representations for each task. We conduct extensive experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
翻译:在神经网络中,持续学习导致连续任务之间的梯度干扰,导致灾难性地忘记旧任务,同时学习新任务。这个问题在近期的方法中得到了解决,方法是为旧任务储存重要的梯度空间,并在新任务期间更新模型或正方位。然而,这种限制性的正方位梯度更新妨碍了新任务的学习能力,导致业绩欠佳。为了改进新的学习,同时尽量减少忘却,我们在本文件中提议了一种渐进式梯度预测方法,将正方位梯度预测与过去任务的重要梯度空间的梯度缩放步骤结合起来。在这些空间上梯度的缩放程度取决于覆盖这些任务的基础的重要性。我们建议了一种高效的方法,利用每项任务投入表述的单值分解法计算和积累这些基点的重要性。我们进行了广泛的实验,从连续的图像分类到强化学习任务,到报告与最先进的管理方法相比培训较少的更好业绩。