While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the resources to pretrain must use existing models with lower performance. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data. Taken together, HPT provides a simple framework for obtaining better pretrained representations with less computational resources.
翻译:虽然自我监督的部署前培训证明对许多计算机愿景任务有益,但它需要昂贵和冗长的计算,大量的数据,而且对数据增强十分敏感。先前的工作表明,在数据集上预先培训的模型不同于其目标数据,如在图像网上培训的胸透光模型,从头到尾培训的不完善模型。缺乏预先培训资源的用户必须使用业绩较低的现有模型。本文探索了等级前培训(HPT),通过以现有的预先培训模型启动培训前进程,缩短了趋同时间,提高了准确性。我们通过对16个不同的预培训数据集的实验,显示HPT将聚合到80x更快,提高任务之间的准确性,提高自监督前培训过程的稳健性,以改变图像增强政策或培训前数据的数量。加在一起,HPT提供了一个简单的框架,用较少的计算资源获得更好的预先培训前表述。