The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000X more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy
翻译:示范培训前和随后的微调的转移学习范式产生了高度精确的模式。虽然大多数研究都建议扩大培训前的数据来源,以便从转让学习中获得最大好处,但仍然存在一个问题:培训前应使用哪些数据和方法?我们调查培训前数据分发对培训前的微小和完全微调性能的影响,使用3个培训前方法(监督、对比性语言图像和图像图像)、7个培训前数据集和9个下游数据集。我们通过广泛的控制实验发现,选择培训前数据来源对于少发转让至关重要,但随着有更多数据可供微调使用,其作用会降低。此外,我们探索数据校正的作用,并研究标签噪音与培训前数据集大小之间的权衡。我们发现,使用LAION更多的培训前数据可以与监督图像网络前培训的绩效相匹配。此外,我们调查培训前方法的效果,比较语言比图像对比性对比性对比性图像对比性对比性图像对比性能,并发现后一种导向下游更精确性。</s>