As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy once the number of training samples is increased in some vision tasks. In this work, we revisit this phenomenon from the perspective of generalization analysis by using excess risk bound which is popular in learning theory. The result reveals that the excess risk bound may have a weak dependency on the pre-trained model. The observation inspires us to leverage pre-training data for fine-tuning, since this data is also available for fine-tuning. The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning. With the theoretical motivation, we propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task. Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based fine-tuning pipeline.
翻译:作为一种主导模式,在许多深层学习应用中,特别是在小型数据集中,广泛使用对目标数据进行预先培训的模式进行微调,但最近的研究表明,一旦培训样本数量在某些愿景任务中有所增加,从零开始的培训最终表现便不比培训前战略差。在这项工作中,我们从一般化分析的角度,通过使用学习理论中流行的超重风险约束,从一般化分析的角度重新审视这一现象。结果显示,受过度风险约束的对预先培训模式的依赖程度可能较弱。观察促使我们利用培训前数据进行微调,因为这一数据也可用于微调。使用培训前数据的一般化结果表明,如果将适当的培训前数据纳入微调,则与目标任务有关的超重风险是可以改进的。根据理论动机,我们提议从培训前数据中挑选一个子群,以帮助改进目标任务的一般化。8个基准数据集的图像分类任务的广泛实验结果将核查基于微调管道的拟议数据选择的有效性。