以梯度分解优化图层, 用于持续学习 (Layerwise Optimization by Gradient Decomposition for Continual Learning)

Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. However, when learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting". To achieve the consistencies between the old tasks and the new task, one effective solution is to modify the gradient for update. Previous methods enforce independent gradient constraints for different tasks, while we consider these gradients contain complex information, and propose to leverage inter-task information by gradient decomposition. In particular, the gradient of an old task is decomposed into a part shared by all old tasks and a part specific to that task. The gradient for update should be close to the gradient of the new task, consistent with the gradients shared by all old tasks, and orthogonal to the space spanned by the gradients specific to the old tasks. In this way, our approach encourages common knowledge consolidation without impairing the task-specific knowledge. Furthermore, the optimization is performed for the gradients of each layer separately rather than the concatenation of all gradients as in previous works. This effectively avoids the influence of the magnitude variation of the gradients in different layers. Extensive experiments validate the effectiveness of both gradient-decomposed optimization and layer-wise updates. Our proposed method achieves state-of-the-art results on various benchmarks of continual learning.

翻译：深心神经网络在不同领域实现最先进的、有时是超人的业绩。然而,当学习任务相继进行时,这些网络很容易忘记以前任务的知识,即“灾难性的忘记” 。为了实现旧任务和新任务之间的一致性,一个有效的解决办法是修改梯度以更新。以往的方法对不同任务实施独立的梯度限制,而我们则认为这些梯度包含复杂信息,并提议通过梯度分解利用任务间信息。特别是,旧任务的梯度被分解成所有旧任务共有的部分,而该任务则有一部分。更新的梯度应接近新任务的梯度,这与所有旧任务共有的梯度一致,或者与旧任务所特有的梯度所跨越的空间一致。这样,我们的方法鼓励在不损害特定任务知识的情况下整合共同的知识。此外,对每个层次的梯度进行优化是分别的,而不是与以前工作的所有梯度相交配。更新的梯度的梯度的梯度的梯度应该接近于新任务的梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度,这与所有旧任务共有的梯度梯度梯度的梯度梯度梯度梯度的梯度,与所有旧任务共有的梯度的梯度比。更新的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度应该避免了新的梯度梯度梯度梯度梯度,与所有旧的梯度的梯度的梯度比,与所有旧任务的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度差度差度比。。这可以有效避免了新度的梯度的梯度的梯度的梯度的梯度的梯度与旧的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度与旧的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度与旧的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度的梯度与旧的梯度的梯度的梯度

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【PKDD2021】成对偏好学习，109页ppt，Pairwise Preference Learning

专知会员服务

21+阅读 · 2021年6月10日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日