The "pre-training $\rightarrow$ downstream adaptation" presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved through Parameter-Efficient-Tuning (PET) adaptation paradigm, only prompt has been explored, limiting its application to Transformers only. In this paper, we position prompting as one instantiation of PET, and propose a unified CL framework with general PET, dubbed as Learning-Accumulation-Ensemble (LAE). PET, e.g., using Adapter, LoRA, or Prefix, can adapt a pre-trained model to downstream tasks with fewer parameters and resources. Given a PET method, our LAE framework incorporates it for CL with three novel designs. 1) Learning: the pre-trained model adapts to the new task by tuning an online PET module, along with our adaptation speed calibration to align different PET modules, 2) Accumulation: the task-specific knowledge learned by the online PET module is accumulated into an offline PET module through momentum update, 3) Ensemble: During inference, we respectively construct two experts with online/offline PET modules (which are favored by the novel/historical tasks) for prediction ensemble. We show that LAE is compatible with a battery of PET methods and gains strong CL capability. For example, LAE with Adaptor PET surpasses the prior state-of-the-art by 1.3% and 3.6% in last-incremental accuracy on CIFAR100 and ImageNet-R datasets, respectively.
翻译:"预训练 $\rightarrow$ 下游适应" 对持续学习(CL)既提供了新机遇,也带来了新挑战。虽然通过参数有效调整(PET)适应范式取得了最新的 CL 最优结果,但仅在一些 Transformer 模型中得到应用,因此局限了它的应用。本文将提示作为 PET 的一种实例,并提出了一种称为学习积累集成(LAE)的通用 PET 的 CL 框架。PET,例如适配器、LoRA 或前缀,可以通过更少的参数和资源适应预训练模型到下游任务中。在给定 PET 方法的情况下,我们的 LAE 框架具有三个新的设计,将其纳入 CL 中。1)学习:预训练模型通过调整在线 PET 模块适应新任务,同时采用调整速度校准来对齐不同的 PET 模块;2)积累:在线 PET 模块学习的特定任务知识通过冲量更新积累到离线 PET 模块中;3)集成:在推理过程中,我们分别构建两个专家,即在线/离线 PET 模块,用于预测整合(这些模块受到新旧任务的青睐)。我们展示了 LAE 与一系列 PET 方法的兼容性,并获得了强大的 CL 能力。例如,使用 Adaptor PET 的 LAE 在 CIFAR100 和 ImageNet-R 数据集上分别比之前的 CL 最优结果提高了 1.3% 和 3.6%。