One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task. Usually this is done through fine-tuning, where an implicit assumption is that the network maintains its plasticity, meaning that the performance it can reach on any given task is not affected negatively by previously seen tasks. It has been observed recently that a pretrained model on data from the same distribution as the one it is fine-tuned on might not reach the same generalisation as a freshly initialised one. We build and extend this observation, providing a hypothesis for the mechanics behind it. We discuss the implication of losing plasticity for continual learning which heavily relies on optimising pretrained models.
翻译:多个环境,如持续学习或转让学习,共同的目标之一是利用以前获得的知识,更快地结合当前的任务。通常,这是通过微调完成的,其中隐含的假设是网络保持其可塑性,这意味着它能够完成的任何特定任务的业绩不会受到以前所看到的任务的不利影响。最近观察到,一个事先经过训练的关于分配数据的模式可能与经过微调的模型不同,可能不会达到与刚开始的模型相同的一般性。我们建立和扩展这一观察,为后面的机械师提供假设。我们讨论了丧失可塑性对持续学习的影响,因为持续学习在很大程度上依赖于优化预先培训的模式。