Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner's performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage k-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce two new benchmarks for continual learning under distributional shift, namely Continual Swiss Roll and ImageNet SuperDog-40. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.
翻译:从非静止输入数据流中不断/终身学习,这是情报的基石。尽管深神经网络在各种应用中表现惊人,但很容易在学习新应用时忘记以前学到的信息。这种现象被称为“灾难性的遗忘”,深深扎根于稳定-塑料的两难困境。在深层神经网络中克服灾难性的遗忘,近年来已成为一个积极的研究领域。特别是,基于梯度的预测方法最近显示,在克服灾难性的遗忘方面表现非常出色。本文件提出了两种生物激励机制,其基础是:在大量应用后,在漫长的任务序列中大大提高不断学习者的成绩,从而大大增加了它们以前学到的信息。我们提议的方法建立在梯度预测性项目记忆(GPM)框架上。我们利用神经网络的每一层的K-winner激活力,对每项任务实施层与层之间的干燥的启动力,以及鼓励网络在不同任务之间使用非重叠的激活模式。此外,我们提出了两个在分配变化中不断学习的新基准,即连续式瑞士滚动和连续推进式图像网络的升级分析,我们提出了一种重要的学习方法。