We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.
翻译:我们研究的是持续学习的实际环境:不断微调培训前模式。以前的工作发现,当培训新任务时,以前数据的特点(前半层图示)将发生变化,称为代表制转变。除了特征的转变,我们发现中间层代表制变化也很重要,因为它扰乱了批次正常化,这是灾难性遗忘的另一个关键原因。为此,我们提议ConFIT(ConFIT),这是一个包含两个组成部分的微调方法,跨转批次正常化(Xconv BN)和等级微调。 Xconv BN(Xconv BN)保持革命前运行手段,而不是革命后运行手段,并在测试前恢复革命后运行手段,这纠正了IRS(IR)下手段的不准确估计。等级微调利用多阶段战略来微调预先培训前的网络,防止Conv层发生大规模变化,从而缓解IRS。四个数据集的实验结果表明,我们的方法明显优于若干州级的存储管理管理方式。