Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
翻译:在标准转让和适应任务方面,经过事先培训的大型零点能力模型已经显示出相当大的成功,特别是在分配转换方面特别稳健。此外,随后的微调可以大大改进选定下游任务的业绩。然而,通过天真的微调,这些零点模型在分配转换方面失去了一般性和稳健性。对于诸如连续学习(CL)等任务来说,这是一个特殊的问题,因为连续学习(CL)随着新的任务分配顺序的引入,必须连续地进行适应。在这项工作中,我们展示了微调不足以适应这种零点能力模型的地方,简单的基于动力的内推法可以持续改进CL任务在不留记忆和基于记忆的环境中的绩效。我们特别发现,在标准的CL基准方面,零点改进了超过+4美元,同时将所有任务联合培训的上限缩小到一半以上,使不断学习者能够接近联合培训的限度。