Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. The codebase is available under https://github.com/oleksost/latent_CL.
翻译:大规模培训前的快速发展产生了一些基础模型,这些模型可以作为一系列下游任务和领域的有效特征提取器。为此,我们研究了作为下游连续学习(CL)情景基础的预培训愿景模型的功效。我们的目标是双重的。首先,我们想了解CL在原始数据空间和预先训练的编码编码的潜在空间之间的计算-准确性交易。第二,我们研究了编码的特性、培训前的多样化算法和数据以及由此产生的潜在空间如何影响 CL的性能。为此,我们比较了作为下游连续学习(CL)情景基础的各种预培训前的愿景模型的功效。我们想了解CL的重新设置。值得注意的是,这项研究显示了CL的转移、忘记、任务相似和学习如何取决于输入数据特性,而不一定取决于CL的算法。首先,我们发现在某些情况下,CL的性能可以很容易实现合理的CL的性能,而后产生的潜在空间空间则会影响CL的性能作用。我们然后展示了这些模型是如何提高在前的性能。我们用在CL的自我分析中,我们用这些模型来更精确的性展示了这些在上的数据的自我表现。