计算机视觉模型培训前受监督和自我监督的博彩票票假称 (The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models)

The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR and MoCo. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that the pre-training benefits from gigantic model capacity. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its universal downstream transferability? In this paper, we examine the supervised and self-supervised pre-trained models through the lens of lottery ticket hypothesis (LTH). LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch, to reach the full models' performance. We extend the scope of LTH to questioning whether matching subnetworks still exist in the pre-training models, that enjoy the same downstream transfer performance. Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR and MoCo, we are consistently able to locate such matching subnetworks at 59.04% to 96.48% sparsity that transfer universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights. Further analyses reveal that subnetworks found from different pre-training tend to yield diverse mask structures and perturbation sensitivities. We conclude that the core LTH observations remain generally relevant in the pre-training paradigm of computer vision, but more delicate discussions are needed in some cases. Codes and pre-trained models will be made available at: https://github.com/VITA-Group/CV_LTH_Pre-training.

翻译：计算机预视世界在各种预先培训的模型中,包括古典图像网监管的预培训前和最近出现的自我监督的预培训前(如SIMCLR)和MoCo。预培训权重通常会促进一系列广泛的下游任务,包括分类、检测和分化。最新研究表明,培训前培训从巨大的模型能力中获益。我们在此好奇并询问:在培训前,预培训模式确实必须保持其普遍的下游可转让性。在本文中,我们通过彩票假设(LTH)的镜头,来检查受监管和自我监督的预培训前模式。LTH发现高度稀少的匹配的子网络,可以在远离(近距离)抓抓、达到完整模型性能。我们扩大LTH的范围,以质疑在培训前模式中是否还存在匹配的子网络。我们的广泛实验传达了一个总体的正面信息:从图像网分类、SIMLR和MoCicial 获得的所有经培训前加权权重的模型,我们在96度假设(LHTH)中始终能够找到(一般情况下)预认证的下游数据序列分析。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/