A large body of research in continual learning is devoted to overcoming the catastrophic forgetting of neural networks by designing new algorithms that are robust to the distribution shifts. However, the majority of these works are strictly focused on the "algorithmic" part of continual learning for a "fixed neural network architecture", and the implications of using different architectures are mostly neglected. Even the few existing continual learning methods that modify the model assume a fixed architecture and aim to develop an algorithm that efficiently uses the model throughout the learning experience. However, in this work, we show that the choice of architecture can significantly impact the continual learning performance, and different architectures lead to different trade-offs between the ability to remember previous tasks and learning new ones. Moreover, we study the impact of various architectural decisions, and our findings entail best practices and recommendations that can improve the continual learning performance.
翻译:持续学习方面的大量研究致力于通过设计与分布变化相适应的新算法来克服神经网络被灾难性遗忘的问题。然而,这些工程大多严格侧重于持续学习“固定神经网络结构”的“算法”部分,而使用不同结构的影响则大都被忽视。即使是修改模型的少数现有持续学习方法也假设了固定结构,目的是开发一种在整个学习经验中高效使用模型的算法。然而,在这项工作中,我们表明,结构的选择可以极大地影响持续学习的绩效,而不同的结构在记住以往任务的能力和学习新任务的能力之间导致不同的取舍。 此外,我们研究各种建筑决定的影响,我们的调查结果包含能够改善持续学习业绩的最佳做法和建议。