减少持续微调的代表权转移 (Alleviating Representational Shift for Continual Fine-tuning)

We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.

翻译：我们研究的是持续学习的实际环境:不断微调培训前模式。以前的工作发现,当培训新任务时,以前数据的特点(前半层图示)将发生变化,称为代表制转变。除了特征的转变,我们发现中间层代表制变化也很重要,因为它扰乱了批次正常化,这是灾难性遗忘的另一个关键原因。为此,我们提议ConFIT(ConFIT),这是一个包含两个组成部分的微调方法,跨转批次正常化(Xconv BN)和等级微调。 Xconv BN(Xconv BN)保持革命前运行手段,而不是革命后运行手段,并在测试前恢复革命后运行手段,这纠正了IRS(IR)下手段的不准确估计。等级微调利用多阶段战略来微调预先培训前的网络,防止Conv层发生大规模变化,从而缓解IRS。四个数据集的实验结果表明,我们的方法明显优于若干州级的存储管理管理方式。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日