Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a solvable case of continual learning. We consider neural networks in the neural tangent kernel (NTK) regime that continually learn target functions from task to task, and investigate the generalization by using an established statistical mechanical analysis of kernel ridge-less regression. We first show characteristic transitions from positive to negative transfer. More similar targets above a specific critical value can achieve positive knowledge transfer for the subsequent task while catastrophic forgetting occurs even with very similar targets. Next, we investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task. We can guarantee that the generalization error monotonically decreases from task to task for equal sample sizes while unbalanced sample sizes deteriorate the generalization. We respectively refer to these improvement and deterioration as self-knowledge transfer and forgetting, and empirically confirm them in realistic training of deep neural networks as well.
翻译:从任务到任务的顺序培训正在成为深层次学习应用,例如不断学习和转移学习的主要目的之一。然而,在经过训练的模型的性能在什么条件下会改善或恶化,仍然不清楚。为了加深我们对连续培训的理解,本研究在一种可以持续学习的情况下提供了对一般表现的理论分析。我们考虑神经中切核核心(NTK)系统中的神经网络,这些网络不断从任务到任务学习目标功能,并通过对无脊椎回归进行既定的统计机械分析来调查一般化。我们首先可以显示从正向负转移的特征。在特定的关键值以上更相似的目标能够为随后的任务实现积极的知识转让,而灾难性的遗忘甚至会发生非常相似的目标。我们研究一个持续学习的变式,这些模型在多个任务中学习相同的目标功能。即使针对同一目标,经过训练的模型也显示一些转移和忘记取决于每项任务的抽样大小。我们可以保证一般化错误从任务到相同样本大小的任务,而单一的错误会从正向负转移。一个特殊的临界值目标可以实现积极的知识转移,同时在深刻的记忆中,我们分别指出这些改进和深刻的记忆学习。