We study how different output layer types of a deep neural network learn and forget in continual learning settings. We describe the three factors affecting catastrophic forgetting in the output layer: (1) weights modifications, (2) interferences, and (3) projection drift. Our goal is to provide more insights into how different types of output layers can address (1) and (2). We also propose potential solutions and evaluate them on several benchmarks. We show that the best-performing output layer type depends on the data distribution drifts or the amount of data available. In particular, in some cases where a standard linear layer would fail, it is sufficient to change the parametrization and get significantly better performance while still training with SGD. Our results and analysis shed light on the dynamics of the output layer in continual learning scenarios and help select the best-suited output layer for a given scenario.
翻译:我们研究深神经网络的不同产出层类型如何在不断学习的环境中学习和遗忘。我们描述了影响产出层中灾难性遗忘的三个因素:(1) 重量修改,(2) 干扰,(3) 投射漂移。我们的目标是更深入地了解不同类型产出层如何能处理(1)和(2)。我们还提出潜在解决方案,并根据若干基准对其进行评估。我们表明,最佳产出层类型取决于数据分布流或现有数据的数量。特别是,在某些情况下,标准线性层将失败,只需改变对称化,并在仍然接受 SGD 培训的同时取得显著的更好性能即可。我们的结果和分析揭示了产出层在持续学习情景中的动态,并帮助选择适合特定情景的最佳产出层。