The backpropagation networks are notably susceptible to catastrophic forgetting, where networks tend to forget previously learned skills upon learning new ones. To address such the 'sensitivity-stability' dilemma, most previous efforts have been contributed to minimizing the empirical risk with different parameter regularization terms and episodic memory, but rarely exploring the usages of the weight loss landscape. In this paper, we investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario, based on which, we propose a novel method, Flattening Sharpness for Dynamic Gradient Projection Memory (FS-DGPM). In particular, we introduce a soft weight to represent the importance of each basis representing past tasks in GPM, which can be adaptively learned during the learning process, so that less important bases can be dynamically released to improve the sensitivity of new skill learning. We further introduce Flattening Sharpness (FS) to reduce the generalization gap by explicitly regulating the flatness of the weight loss landscape of all seen tasks. As demonstrated empirically, our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.
翻译:反向调整网络显然容易被灾难性遗忘,因为网络往往在学习新东西时会忘记以前学到的技能。为了解决这种“敏感-稳定”的难题,以往的大部分努力都有助于以不同参数的正规化条件和偶发记忆来尽量减少经验风险,但很少探索体重减退场景的用法。在本文中,我们调查了体重减肥场景与持续学习情景中敏感性-常识-常识-常识-常识-常识-常识-常识之间的关系,我们在此基础上提出了一种新颖的方法,即“为动态渐进式射精射内存(FS-DGPM)点点火速。特别是,我们引入了一种软重力,以代表GPM中代表每个基础的重要性,在学习过程中可以适应性地学习,这样就不太重要的基础可以动态地释放,以提高新技能学习的敏感性。我们进一步引入了Flattlening Sharpness(FS),以明确调节所有所看到的任务重量减缩场景色的平坦。从经验上证明,我们提出的方法始终超越了学习新技能的优势基线。