Almost all the state-of-the-art neural networks for computer vision tasks are trained by (1) pre-training on a large-scale dataset and (2) finetuning on the target dataset. This strategy helps reduce dependence on the target dataset and improves convergence rate and generalization on the target task. Although pre-training on large-scale datasets is very useful, its foremost disadvantage is high training cost. To address this, we propose efficient filtering methods to select relevant subsets from the pre-training dataset. Additionally, we discover that lowering image resolutions in the pre-training step offers a great trade-off between cost and performance. We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings and finetuning on a diverse collection of target datasets and tasks. Our proposed methods drastically reduce pre-training cost and provide strong performance boosts. Finally, we improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
翻译:几乎所有最先进的计算机愿景任务神经网络都通过下列方式接受培训:(1) 大规模数据集预培训,(2) 目标数据集的微调。这一战略有助于减少对目标数据集的依赖,提高目标数据集的趋同率和一般化。虽然大规模数据集预培训非常有用,但其最大的缺点是高培训成本。为了解决这一问题,我们建议了从培训前数据集中选择相关子集的有效过滤方法。此外,我们发现,培训前步骤中降低图像分辨率在成本和性能之间有很大的权衡。我们通过在不受监督和监督的环境中对图像网络进行预培训,并在不同的目标数据集和任务收集方面进行微调,从而验证我们的技术。我们提出的方法大幅降低培训前费用,并提供强大的性能增强。最后,我们通过调整关于从更大规模数据集中过滤的数据集的现有模型和预培训,改进标准图像网络预培训1-3%。