带比例梯度预测的连续学习 (Continual Learning with Scaled Gradient Projection)

In neural networks, continual learning results in gradient interference among sequential tasks, leading to catastrophic forgetting of old tasks while learning new ones. This issue is addressed in recent methods by storing the important gradient spaces for old tasks and updating the model orthogonally during new tasks. However, such restrictive orthogonal gradient updates hamper the learning capability of the new tasks resulting in sub-optimal performance. To improve new learning while minimizing forgetting, in this paper we propose a Scaled Gradient Projection (SGP) method, where we combine the orthogonal gradient projections with scaled gradient steps along the important gradient spaces for the past tasks. The degree of gradient scaling along these spaces depends on the importance of the bases spanning them. We propose an efficient method for computing and accumulating importance of these bases using the singular value decomposition of the input representations for each task. We conduct extensive experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.

翻译：在神经网络中,持续学习导致连续任务之间的梯度干扰,导致灾难性地忘记旧任务,同时学习新任务。这个问题在近期的方法中得到了解决,方法是为旧任务储存重要的梯度空间,并在新任务期间更新模型或正方位。然而,这种限制性的正方位梯度更新妨碍了新任务的学习能力,导致业绩欠佳。为了改进新的学习,同时尽量减少忘却,我们在本文件中提议了一种渐进式梯度预测方法,将正方位梯度预测与过去任务的重要梯度空间的梯度缩放步骤结合起来。在这些空间上梯度的缩放程度取决于覆盖这些任务的基础的重要性。我们建议了一种高效的方法,利用每项任务投入表述的单值分解法计算和积累这些基点的重要性。我们进行了广泛的实验,从连续的图像分类到强化学习任务,到报告与最先进的管理方法相比培训较少的更好业绩。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日