We study how different output layers in a deep neural network learn and forget in continual learning settings. The following three factors can affect catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layers may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of the output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, it turns out that changing parameterization is sufficient in order to achieve a significantly better performance, whithout introducing a continual-learning algorithm and instead using the standard SGD to train a model. Our analysis and results shed light on the dynamics of the output layer in continual learning scenarios, and suggest a way of selecting the best type of output layer for a given scenario.
翻译:我们研究深神经网络的不同输出层如何在不断学习的环境中学习和遗忘。以下三个因素可能会影响产出层的灾难性遗忘:(1) 重量修改,(2) 干扰,(3) 投影漂移。在本文中,我们的目标是更深入地了解产出层的变化如何解决(1)和(2) 这些问题的一些潜在解决办法在这里以若干持续学习的情景中提出和评价。我们表明产出层的最佳性能取决于数据分布流和/或现有数据的数量。特别是在某些情况下,标准线性层将失败,结果显示改变参数足以实现显著改善性能,引入持续学习算法,而不是使用标准 SGD 来培训模型。我们的分析与结果揭示了持续学习情景中产出层的动态,并提出了为特定情景选择最佳产出层类型的方法。