The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such efforts, typical solutions offer sophisticated techniques involving memory replay, knowledge distillation, model regularization, and dynamic network expansion. The resulting methods have a retraining cost at each learning task, dedicated memory requirements, and setting-specific design choices. In this work, we show that a frozen CLIP (Contrastive Language-Image Pretraining) model offers astounding continual learning performance without any fine-tuning (zero-shot evaluation). We evaluate CLIP under a variety of settings including class-incremental, domain-incremental and task-agnostic incremental learning on five popular benchmarks (ImageNet-100 & 1K, CORe50, CIFAR-100, and TinyImageNet). Without any bells and whistles, the CLIP model outperforms the state-of-the-art continual learning approaches in the majority of the settings. We show the effect on the CLIP model's performance by varying text inputs with simple prompt templates. To the best of our knowledge, this is the first work to report the CLIP zero-shot performance in a continual setting. We advocate the use of this strong yet embarrassingly simple baseline for future comparisons in the continual learning tasks.
翻译:持续学习的设置旨在长期学习新的任务,而不会忘记以前的那些任务。文献报告说,为解决这一问题作出了一些重大努力,但获得以前的任务数据的机会有限或根本没有机会。在这些努力中,典型的解决方案提供了复杂的技术,包括记忆回放、知识蒸馏、模式正规化和动态网络扩展等。由此产生的方法在每项学习任务、专门的记忆要求和具体设计选择方面都有再培训成本。在这项工作中,我们显示,一个冻结的CLIP(CORTIP-LIMA)模式(CORTIP-LIMA)模式在不作任何微调(零镜头评价)的情况下,提供了惊人的不断学习业绩。我们在各种环境下评价CLIP(CLIP)方案,在各种环境中,包括类入门、领域入门和任务进门的渐进式递增学习(IMG-100 & 1K, CORE50, CIFAR-100, 和TinyIMage Net) 等5个通用基准(IMGE) 的再培训费用。在没有任何铃盘和提示的情况下,CLIP模型在大多数环境中,超越了最先进的持续学习方法。我们在CLIP模型中展示了对CIP(CIPLIP)的连续学习模型的周期性、最精确的进度,我们最精确的学习任务、最精确的版本的进度,这是我们最精确的版本。