We study how different output layer parameterizations of a deep neural network affects learning and forgetting in continual learning settings. The following three effects can cause catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layer parameterization may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, changing parameterization is sufficient to achieve a significantly better performance, without introducing any continual-learning algorithm but instead by using standard SGD to train a model. Our analysis and results shed light on the dynamics of the output layer in continual learning scenarios and suggest a way of selecting the best type of output layer for a given scenario.
翻译:我们研究深神经网络的不同输出层参数如何影响学习和在持续学习环境中的遗忘。以下三种影响可能导致产出层的灾难性遗忘:(1) 重量修改,(2) 干扰,(3) 投射漂移。在本文件中,我们的目标是更深入地了解产出层参数化的变化如何解决(1)和(2) 这些问题的一些潜在解决办法在此以若干持续学习的情景中提出和评价。我们表明,最有效益的产出层类型取决于数据分布的漂移和/或现有数据的数量。特别是,在某些情况下,标准线性层将失败,改变参数化就足以实现显著的更好性能,而不会引入任何持续学习算法,而是使用标准 SGD 来培训模型。我们的分析与结果为持续学习情景中产出层的动态提供了亮度,并提出了为特定情景选择最佳产出层类型的方法。