Standard gradient descent algorithms applied to sequences of tasks are known to produce catastrophic forgetting in deep neural networks. When trained on a new task in a sequence, the model updates its parameters on the current task, forgetting past knowledge. This article explores scenarios where we scale the number of tasks in a finite environment. Those scenarios are composed of a long sequence of tasks with reoccurring data. We show that in such setting, stochastic gradient descent can learn, progress, and converge to a solution that according to existing literature needs a continual learning algorithm. In other words, we show that the model performs knowledge retention and accumulation without specific memorization mechanisms. We propose a new experimentation framework, SCoLe (Scaling Continual Learning), to study the knowledge retention and accumulation of algorithms in potentially infinite sequences of tasks. To explore this setting, we performed a large number of experiments on sequences of 1,000 tasks to better understand this new family of settings. We also propose a slight modifications to the vanilla stochastic gradient descent to facilitate continual learning in this setting. The SCoLe framework represents a good simulation of practical training environments with reoccurring situations and allows the study of convergence behavior in long sequences. Our experiments show that previous results on short scenarios cannot always be extrapolated to longer scenarios.
翻译:用于任务序列的标准梯度下沉算法已知会在深神经网络中产生灾难性的遗忘。 当对新任务进行连续培训时, 模型会更新当前任务的参数, 忘记过去的知识。 本条会探索我们在有限环境中扩大任务数量的情景。 这些情景由一系列长的任务和重复数据组成。 我们显示, 在这种环境下, 随机梯度梯度梯度梯度下降可以学习、 进步, 并会合到一个根据现有文献需要不断学习的解决方案。 换句话说, 我们显示模型在没有特定记忆机制的情况下, 进行知识保留和积累。 我们提出一个新的实验框架, 即 Scole( 扩展持续学习), 以研究在可能无限的任务序列中如何保留和积累算法。 为了探索这一环境, 我们进行了大量关于1,000个任务序列的实验, 以更好地了解这个新环境的新的环境。 我们还提议对香草色梯度梯度梯度梯度梯度下降略的梯度下降, 以方便在这一环境中的持续学习。 Scole 框架是一个很好的实际培训环境的模拟, 以及我们无法长期的实验中长期的模拟。