简化模型结构下持续学习中预训练模型的必要性的疑问 (A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning)

With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed. Some of these methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning. In this paper, we question whether the complexity of these models is needed to achieve good performance by comparing them to a simple baseline that we designed. We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks. To validate this, we conduct a very simple baseline that 1) use the frozen pretrained model to extract image features for every class encountered during the continual learning stage and compute their corresponding mean features on training data, and 2) predict the class of the input based on the nearest neighbor distance between test samples and mean features of the classes; i.e., Nearest Mean Classifier (NMC). This baseline is single-headed, exemplar-free, and can be task-free (by updating the means continually). This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all initialized using the same pretrained transformer model. We hope our baseline may encourage future progress in designing learning systems that can continually add quality to the learning representations even if they started from some pretrained weights.

翻译：随着表示学习中预训练技术的成功，基于预训练模型的持续学习方法已经被提出。其中一些方法在预训练表示之上设计了持续学习机制，并且只允许在持续学习的训练过程中对主干模型进行最小更新甚至不进行任何更新。在本文中，我们质疑这些方法中的复杂度是否需要为了达到良好的性能，通过将它们与我们设计的简单基线进行比较。我们认为预训练的特征提取器本身就足够强大，可以在 Split-CIFAR100 和 CoRe 50 基准测试中实现有竞争力甚至更好的持续学习性能。为了验证这一点，我们进行了一个非常简单的基线，即1）使用冻结的预训练模型在持续学习阶段提取每个遇到的类别的图像特征，并在训练数据上计算其相应的均值特征，2）基于测试样本和类别均值特征之间的最近邻距离来预测输入的类别，即最近均值分类器(NMC)。这个基线是单头的，无范例的，可以是任务自由的（通过不断更新平均值）。这个基线在 10-Split-CIFAR-100 上取得了88.53% 的成绩，超过了大多数最先进的持续学习方法，它们都是使用相同的预训练转换模型进行初始化的。我们希望我们的基线可以鼓励未来在设计学习系统方面的进展，这些系统可以不断为学习表示增加质量，即使它们开始于一些预训练权重。

相关内容

持续学习

关注 25

持续学习(continuallearning,CL) 是模拟大脑学习的过程,按照一定的顺序对连续非独立同分布的 (independentlyandidenticallydistributed,IID)流数据进行学习,进而根据任务的执行结果对模型进行增量式更新．持续学习的意义在于高效地转化和利用已经学过的知识来完成新任务的学习,并且能够极大程度地降低遗忘带来的问题．连续学习研究对智能计算系统自适应地适应环境改变具有重要的意义

图神经网络GNN预训练技术进展概述

专知会员服务

44+阅读 · 2021年4月12日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

专知会员服务

70+阅读 · 2020年7月1日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日