关于终身学习培训前的作用的经验调查 (An Empirical Investigation of the Role of Pre-training in Lifelong Learning)

The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning, but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel dataset of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness in order to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach leads to performance comparable to the state-of-the-art in task-sequential continual learning across multiple settings, without retaining a memory that scales in size with the number of tasks.

翻译：机器学习中的终身学习范式是更突出的孤立学习计划的有吸引力的替代方法,这不仅是因为它与生物学习相似,而且因为它有可能通过避免过多的模式再培训来减少能源浪费。这一范式面临的一个关键挑战就是灾难性的遗忘现象。随着在机器学习中经过预先训练的模型越来越受欢迎和成功,我们提出了一个问题:培训前在终身学习中扮演什么角色,特别是在灾难性的遗忘方面?我们从大规模、预先训练的模型的角度来调查现有的方法,并评估其在各种文本和图像分类任务方面的表现,包括利用15种不同的NLP任务的新数据集进行大规模研究,包括利用15种新颖的NLP任务来减少能源浪费。我们观察到,在各种环境中,在学习多种任务时,一般培训前培训会间接减轻灾难性的遗忘的影响,而与随机的初始模式相比,我们接着进一步调查为什么培训前的学习会减轻在这种背景下的忘却。我们通过分析损失情况来研究这种现象,发现事先训练前的重重重重重重重重重重重重重重重重重重重轻轻轻重重重重重重重重轻轻轻轻轻轻轻轻轻轻轻重重重重重重重重重重重重重重重轻重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重的重重的重重的重的重的重的重重重重重后,我们提议,我们优化重重重重重重重重重重重重重重重重重轻重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重后重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重重