A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient orthogonality, sparsity, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks.
翻译:持续学习研究的一个主要重点领域是,通过设计与分布变化相比更加强大的新算法,减轻神经网络中的“灾难性遗忘”问题。虽然最近不断学习文学的进展令人鼓舞,但我们对神经网络的特性导致灾难性遗忘的理解仍然有限。 为了解决这个问题,我们不注重持续学习算法,而是注重模型本身,研究神经网络结构的“边缘”对灾难性遗忘的影响,并表明宽度对遗忘有着惊人的重大影响。为了解释这一影响,我们从梯度或多度、宽度和懒惰培训制度等不同角度研究网络的学习动态。我们提供了符合不同结构的经验结果和持续学习基准的潜在解释。